Re: [PATCH] dax/hmem: Fix refcount leak in dax_hmem_probe()

2023-06-02 Thread Paul Cassella
On Fri, 2 Jun 2023, Ira Weiny wrote:
> Paul Cassella wrote:
> > On Sat, 3 Dec 2022, Ira Weiny wrote:
> > > On Sat, Dec 03, 2022 at 09:58:58AM +, Yongqiang Liu wrote:

> > > > We should always call dax_region_put() whenever devm_create_dev_dax()
> > > > succeed or fail to avoid refcount leak of dax_region. Move the return
> > > > value check after dax_region_put().

> > > I think dax_region_put is called from dax_region_unregister() 
> > > automatically on
> > > tear down.

> > Note the reference dax_region_unregister() will be putting is the one 
> > devm_create_dev_dax() takes by kref_get(_region->kref).   I think 
> > dax_hmem_probe() needs to put its reference in the error case, as in the 
> > successful case.

> Looking at this again I'm inclined to agree that something is wrong.  But
> I'm not sure this patch fixes it. anything.

> When you say:
> 
> > ...   I think 
> > dax_hmem_probe() needs to put its reference in the error case, as in the 
> > successful case.
> 
> ... which kref_get() is dax_hmem_probe() letting go?

Isn't it letting go of the initial kref_init() reference from 
alloc_dax_region()?

Sorry, I had gone through the code a little carelessly yesterday.  Now I 
think that kref_init() reference is the one that dax_hmem_probe() is 
dropping in the success case, and which still needs to be dropped in the 
error case.

(If so, I think the alloc_dax_region() path that returns NULL on 
devm_add_action_or_reset() failure, releasing the kref_get reference, will 
leak the kref_init reference and the region.)



-- 
Paul Cassella


> Here is what I see with the current code.
> 
> dax_hmem_probe()
>   alloc_dax_region()
>   kref_get(_region->kref)
>   devm_add_action_or_reset(... dax_region_unregister, ...)
>   ... kref_put() later...
> 
>   devm_create_dev_dax()
>   ...may return error...
>   kref_get()
>   [dev_dax_release() set to call kref_put() later]
>   ...may return error...
> 
>   if not error
>   dax_region_put() => kref_put()
> 
> I think this is an extra unneeded put???
> 
> Dan I see this pattern repeated in cxl and pmem.  Is the intent to remove
> the need for dax_region_unregister() to be called when the platform device
> unwinds because the platform device is not intended to own the dax_region
> after success?  If so it seems like the device managed call should be
> removed too.  Not just calling dax_region_put()?  :-/
> 
> But wouldn't that cause an issue with the sysfs entries created?
> 
> > 
> > Consider, devm_create_dev_dax() has error paths that return without 
> > involving dax_region_unregister(), prior to kref_get() and device_add().  
> > dax_hmem_probe() is clearly responsible for freeing the region in those 
> > cases.
> 
> No the devm_add_action_or_reset(... dax_region_unregister, ...) will cause
> dax_region_unregister() to release the reference when the platform device
> unwinds.
> 
> devm_create_dev_dax() configures a reference release through the
> dev_dax->type release.  So when the dev_dax device is put the dax_region
> will be released through dev_dax_release()->dax_region_put().
> 
> > 
> > 
> > dax_hmem_probe() drops its own reference in the successful case because 
> > (per the comment) "child dev_dax instances now own the lifetime of the 
> > dax_region".  That ownership is the reference that the error-case 
> > dax_region_unregister() is dropping.
> 
> No dax_region_unregister() is not just an error case flow.
> 
> >
> > dax_hmem_probe()'s initial reference 
> > also needs to be dropped in the error case, as in the successful case.
> > 
> 
> I don't follow this.  Doesn't this now result in an invalid reference
> release in dax_region_unregister() when the platform device is unwound?
> Furthermore, that reference is required I think for the sysfs entries.
> 
> > 
> > > > Fixes: c01044cc8191 ("ACPI: HMAT: refactor hmat_register_target_device 
> > > > to hmem_register_device")
> > > 
> > > I'm also not sure how this patch is related to this fix.
> > > 
> > > Ira
> > > 
> > > > Signed-off-by: Yongqiang Liu 
> > > > ---
> > > >  drivers/dax/hmem/hmem.c | 5 ++---
> > > >  1 file changed, 2 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> > > > index 1bf040dbc834..09f5cd7b6c8e 100644
> > > > --- a/drivers/dax/hmem/

Re: [PATCH] dax/hmem: Fix refcount leak in dax_hmem_probe()

2023-06-01 Thread Paul Cassella
On Sat, 3 Dec 2022, Ira Weiny wrote:
> On Sat, Dec 03, 2022 at 09:58:58AM +, Yongqiang Liu wrote:

> > We should always call dax_region_put() whenever devm_create_dev_dax()
> > succeed or fail to avoid refcount leak of dax_region. Move the return
> > value check after dax_region_put().

> I think dax_region_put is called from dax_region_unregister() automatically on
> tear down.

Hi Ira,

Note the reference dax_region_unregister() will be putting is the one 
devm_create_dev_dax() takes by kref_get(_region->kref). I think 
dax_hmem_probe() needs to put its reference in the error case, as in the 
successful case.

Consider, devm_create_dev_dax() has error paths that return without 
involving dax_region_unregister(), prior to kref_get() and device_add().  
dax_hmem_probe() is clearly responsible for freeing the region in those 
cases.


dax_hmem_probe() drops its own reference in the successful case because 
(per the comment) "child dev_dax instances now own the lifetime of the 
dax_region".  That ownership is the reference that the error-case 
dax_region_unregister() is dropping.  dax_hmem_probe()'s initial reference 
also needs to be dropped in the error case, as in the successful case.


> > Fixes: c01044cc8191 ("ACPI: HMAT: refactor hmat_register_target_device to 
> > hmem_register_device")
> 
> I'm also not sure how this patch is related to this fix.
> 
> Ira
> 
> > Signed-off-by: Yongqiang Liu 
> > ---
> >  drivers/dax/hmem/hmem.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> > index 1bf040dbc834..09f5cd7b6c8e 100644
> > --- a/drivers/dax/hmem/hmem.c
> > +++ b/drivers/dax/hmem/hmem.c
> > @@ -36,12 +36,11 @@ static int dax_hmem_probe(struct platform_device *pdev)
> > .size = region_idle ? 0 : resource_size(res),
> > };
> > dev_dax = devm_create_dev_dax();
> > -   if (IS_ERR(dev_dax))
> > -   return PTR_ERR(dev_dax);
> >  
> > /* child dev_dax instances now own the lifetime of the dax_region */

This comment should probably be updated now.  :)


-- 
Paul Cassella


> > dax_region_put(dax_region);
> > -   return 0;
> > +
> > +   return IS_ERR(dev_dax) ? PTR_ERR(dev_dax) : 0;
> >  }
> >  
> >  static int dax_hmem_remove(struct platform_device *pdev)
> > -- 
> > 2.25.1
> > 
> > 
> 



Re: Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.

2001-04-03 Thread Paul Cassella

On Wed, 28 Mar 2001, Paul Cassella wrote:

> Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.

I've been running -ac27 for over 5 days, and it's been fine, so this seems
to have been fixed.

-- 
Paul Cassella

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.

2001-04-03 Thread Paul Cassella

On Wed, 28 Mar 2001, Paul Cassella wrote:

 Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.

I've been running -ac27 for over 5 days, and it's been fine, so this seems
to have been fixed.

-- 
Paul Cassella

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.

2001-03-28 Thread Paul Cassella

On Thu, 29 Mar 2001, Alan Cox wrote:

> Was anything between 12 and 18 stable ?

I didn't actually try them; I jumped right from 12 to 18, and when that
and 19 died, I went back to 12. 

But a quick look suggests that the entire patch I'd applied to 12 and got
a hang with was in 13, including the pm.c change.

I also haven't tried anything after 24; is it likely to have been fixed?

-- 
Paul Cassella

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.

2001-03-28 Thread Paul Cassella

Earlier today, I wrote

> and no sysrq keys other than B worked; I didn't hear disk activity
> after S, and the disks weren't unmounted.  Nothing made it to the

Of course, when I rebooted this time (after SysRQ S,U,B), all the
filesystems were clean.

Nothing in the logs this time either though.

> When I get home and reboot (following this most recent hang :( ), I'll
> put the diff, .config, and more stuff from /proc at

>   http://manetheren.eigenray.com/~fortytwo/crash-12-18.2

This is now there.

-- 
Paul Cassella


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.

2001-03-28 Thread Paul Cassella
nd hardware information (/proc/ioports, /proc/iomem)

Under plain -ac12:

manetheren:/var/log/ksymoops# cat /proc/ioports 
-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0070-007f : rtc
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
01f0-01f7 : ide0
0376-0376 : ide1
0378-037a : parport0
037b-037f : parport0
03c0-03df : vga+
03f6-03f6 : ide0
03f8-03ff : serial(set)
0cf8-0cff : PCI conf1
d000-dfff : PCI Bus #01
  d800-d8ff : Lite-On Communications Inc LNE100TX
d800-d8ff : eth0
  df00-df3f : Ensoniq ES1371 [AudioPCI-97]
df00-df3f : es1371
ef80-ef9f : Intel Corporation 82801AA USB
  ef80-ef9f : usb-uhci
efa0-efaf : Intel Corporation 82801AA SMBus
ffa0-ffaf : Intel Corporation 82801AA IDE
  ffa0-ffa7 : ide0
  ffa8-ffaf : ide1
manetheren:/var/log/ksymoops# cat /proc/iomem   
-0009fbff : System RAM
0009fc00-0009 : reserved
000a-000b : Video RAM area
000c-000c7fff : Video ROM
000f-000f : System ROM
0010-07eb : System RAM
  0010-001cd2c5 : Kernel code
  001cd2c6-0020c2ff : Kernel data
07ec-07ef7fff : ACPI Tables
07ef8000-07ef : ACPI Non-volatile Storage
f6a0-f6af : PCI Bus #01
f800-fbff : Intel Corporation 82810-DC100 CGC [Chipset Graphics Controller]
ff80-ff8f : PCI Bus #01
  ff8ffc00-ff8ffcff : Lite-On Communications Inc LNE100TX
ff8ffc00-ff8ffcff : eth0
ffa8-ffaf : Intel Corporation 82810-DC100 CGC [Chipset Graphics Controller]
ffb8-ffbf : reserved
fff0- : reserved

[7.5.] PCI information ('lspci -vvv' as root)
manetheren:/var/log/ksymoops# lspci -vvv
00:00.0 Host bridge: Intel Corporation 82810-DC100 GMCH [Graphics Memory Controller 
Hub] (rev 02)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- 
SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- TAbort- SERR- Reset- FastB2B-

00:1f.0 ISA bridge: Intel Corporation 82801AA ISA Bridge (LPC) (rev 02)
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- 
SERR+ FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- i_size = offset;
if (inode->i_op && inode->i_op->truncate)
+   {
+   /* This doesnt scale but it is meant to be a 2.4 invariant */
+   lock_kernel();
inode->i_op->truncate(inode);
+   unlock_kernel();
+   }
return 0;
 out:
return -EFBIG;


A few lines earlier in this function, inode->i_op->truncate() is called
without lock_kernel().  Should it also have a lock_kernel(), or is it not
needed there? 

-- 
Paul Cassella

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.

2001-03-28 Thread Paul Cassella
Subsystem: Netgear FA310TX
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- 
SERR+ FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- TAbort- 
MAbort- SERR- PERR-
Latency: 64
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at d800 [size=256]
Region 1: Memory at ff8ffc00 (32-bit, non-prefetchable) [size=256]
Expansion ROM at ff88 [disabled] [size=256K]

01:09.0 Multimedia audio controller: Ensoniq ES1371 [AudioPCI-97] (rev 06)
Subsystem: Ensoniq Creative Sound Blaster AudioPCI64V, AudioPCI128
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- 
SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=slow TAbort- TAbort+ 
MAbort+ SERR- PERR-
Latency: 64 (3000ns min, 32000ns max)
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at df00 [size=64]
Capabilities: [dc] Power Management version 1
Flags: PMEClk- DSI+ D1- D2+ AuxCurrent=0mA 
PME(D0+,D1-,D2+,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-



BTW (and this isn't related to my crash because I didn't include this in
my diff) I noticed this in the difference between -ac12 and -ac18 (and I
believe it's still like this in -ac24):

diff -u linux.ac/mm/memory.c linux.ac/mm/memory.c
--- linux.ac/mm/memory.c
+++ linux.ac/mm/memory.c
@@ -978,7 +978,12 @@
}
inode-i_size = offset;
if (inode-i_op  inode-i_op-truncate)
+   {
+   /* This doesnt scale but it is meant to be a 2.4 invariant */
+   lock_kernel();
inode-i_op-truncate(inode);
+   unlock_kernel();
+   }
return 0;
 out:
return -EFBIG;


A few lines earlier in this function, inode-i_op-truncate() is called
without lock_kernel().  Should it also have a lock_kernel(), or is it not
needed there? 

-- 
Paul Cassella

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.

2001-03-28 Thread Paul Cassella

Earlier today, I wrote

 and no sysrq keys other than B worked; I didn't hear disk activity
 after S, and the disks weren't unmounted.  Nothing made it to the

Of course, when I rebooted this time (after SysRQ S,U,B), all the
filesystems were clean.

Nothing in the logs this time either though.

 When I get home and reboot (following this most recent hang :( ), I'll
 put the diff, .config, and more stuff from /proc at

   http://manetheren.eigenray.com/~fortytwo/crash-12-18.2

This is now there.

-- 
Paul Cassella


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.

2001-03-28 Thread Paul Cassella

On Thu, 29 Mar 2001, Alan Cox wrote:

 Was anything between 12 and 18 stable ?

I didn't actually try them; I jumped right from 12 to 18, and when that
and 19 died, I went back to 12. 

But a quick look suggests that the entire patch I'd applied to 12 and got
a hang with was in 13, including the pm.c change.

I also haven't tried anything after 24; is it likely to have been fixed?

-- 
Paul Cassella

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.0-ac3 write() to tcp socket returning errno of -3 (ESRCH:"No such process")

2001-01-11 Thread Paul Cassella

On Tue, 9 Jan 2001, Paul Cassella wrote:

> and mss_now seems to be less than skb->len when the printk happens.  My
> copy of K is at work; could that comparison be being done unsigned
> because of skb->len?  I wouldn't think so, but the alternative seems
> somewhat worse...

That'll teach me to post about integral promotions ...

> + printk(KERN_ERR "%s:%d:%s: err is unexpectedly %d.\n", file, line, 
>func, ret);

... and hand-edit patches before breakfast.


I'm not familiar enough with the tcp code to know if this patch (against
-ac6) is a solution, band-aid, or, in fact, wrong, but I've run with it
(on -ac3) and haven't seen the errors for over twelve hours, which is
three times longer than it had been able to go without it coming up.

--- tcp.c.orig  Thu Jan 11 08:54:50 2001
+++ tcp.c   Thu Jan 11 08:56:42 2001
@@ -954,7 +954,7 @@
 */
skb = sk->write_queue.prev;
if (tp->send_head &&
-   (mss_now - skb->len) > 0) {
+   (signed int)(mss_now - skb->len) > 0) {
copy = skb->len;
if (skb_tailroom(skb) > 0) {
int last_byte_was_odd = (copy % 4);


Or would this be better?

+   (unsigned int)mss_now > skb->len) {

Or making mss_now unsigned in the first place?

-- 
Paul Cassella

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-ac3 write() to tcp socket returning errno of -3 (ESRCH:No such process)

2001-01-11 Thread Paul Cassella

On Tue, 9 Jan 2001, Paul Cassella wrote:

 and mss_now seems to be less than skb-len when the printk happens.  My
 copy of KR is at work; could that comparison be being done unsigned
 because of skb-len?  I wouldn't think so, but the alternative seems
 somewhat worse...

That'll teach me to post about integral promotions ...

 + printk(KERN_ERR "%s:%d:%s: err is unexpectedly %d.\n", file, line, 
func, ret);

... and hand-edit patches before breakfast.


I'm not familiar enough with the tcp code to know if this patch (against
-ac6) is a solution, band-aid, or, in fact, wrong, but I've run with it
(on -ac3) and haven't seen the errors for over twelve hours, which is
three times longer than it had been able to go without it coming up.

--- tcp.c.orig  Thu Jan 11 08:54:50 2001
+++ tcp.c   Thu Jan 11 08:56:42 2001
@@ -954,7 +954,7 @@
 */
skb = sk-write_queue.prev;
if (tp-send_head 
-   (mss_now - skb-len)  0) {
+   (signed int)(mss_now - skb-len)  0) {
copy = skb-len;
if (skb_tailroom(skb)  0) {
int last_byte_was_odd = (copy % 4);


Or would this be better?

+   (unsigned int)mss_now  skb-len) {

Or making mss_now unsigned in the first place?

-- 
Paul Cassella

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-ac3 write() to tcp socket returning errno of -3 (ESRCH:"No such process")

2001-01-09 Thread Paul Cassella
  csum_and_copy_from_user(
from, skb_put(skb, copy),
copy, skb->csum, );
+   CHECK_TCP_RET();
}
/*
 * FIXME: the *_user functions should
@@ -1042,6 +1068,7 @@
}
if (signal_pending(current)) {
err = sock_intr_errno(timeo);
+ CHECK_TCP_RET();
goto do_interrupted;
}
timeo = wait_for_tcp_memory(sk, timeo);
@@ -1078,8 +1105,11 @@
skb->csum = csum_and_copy_from_user(from,
skb_put(skb, copy), copy, 0, );
 
-   if (err)
-   goto do_fault;
+   if (err) {
+ if(copied) check_tcp_ret(copied, __FILE__, __LINE__, 
+__FUNCTION__);
+ else CHECK_TCP_RET();
+ goto do_fault;
+   }
 
from += copy;
copied += copy;
@@ -1092,6 +1122,7 @@
}
}
err = copied;
+ CHECK_TCP_RET();
 out:
__tcp_push_pending_frames(sk, tp, mss_now, tp->nonagle);
 out_unlock:
@@ -1100,14 +1131,20 @@
return err;
 
 do_sock_err:
-   if (copied)
+   if (copied) {
err = copied;
-   else
+   CHECK_TCP_RET();
+
+   } else {
err = sock_error(sk);
+   CHECK_TCP_RET();
+   }
goto out;
 do_shutdown:
-   if (copied)
+   if (copied) {
err = copied;
+   CHECK_TCP_RET();
+   }
else {
if (!(flags_NOSIGNAL))
send_sig(SIGPIPE, current, 0);
@@ -1115,14 +1152,18 @@
}
goto out;
 do_interrupted:
-   if (copied)
+   if (copied) {
err = copied;
+   CHECK_TCP_RET();
+   }
goto out_unlock;
 do_fault:
__kfree_skb(skb);
 do_fault2:
-   if (copied)
+   if (copied) {
err = copied;
+   CHECK_TCP_RET();
+   }
else
err = -EFAULT;
goto out;
@@ -1283,6 +1324,8 @@
 
remove_wait_queue(sk->sleep, );
__set_current_state(TASK_RUNNING);
+   if(timeo < 0)
+ printk(KERN_ERR "tcp_data_wait: timeo == %ld.\n", timeo);
return timeo;
 }
 
@@ -2026,8 +2069,10 @@
if (sk->state != TCP_LISTEN)
break;
err = sock_intr_errno(timeo);
-   if (signal_pending(current))
+   if (signal_pending(current)) {
+   CHECK_TCP_RET();
    break;
+   }
err = -EAGAIN;
if (!timeo)
break;


-- 
Paul Cassella

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-ac3 write() to tcp socket returning errno of -3 (ESRCH:"No such process")

2001-01-07 Thread Paul Cassella

On Mon, 8 Jan 2001, Andi Kleen wrote:

> On Sun, Jan 07, 2001 at 11:55:28PM -0600, Paul Cassella wrote:

> > write() returns -1 and sets errno non-sensically.  2.4.0{,-ac[23]}

> Would it be possible to provide a compiling test case that shows these
> errors ? 

The CVS version (perhaps even version 0.12) of gtk-gnutella should do it,
though it's not exactly what I'm running.  After the crash I posted about,
I've been running it for another three and a half hours under similar
load, and it hasn't crashed again yet; nor have I seen any other
unexpected ret's (besides those noted below) get logged.

http://gtk-gnutella.sourceforge.net/cvs/

Though I don't think gtk-gnutella is special.  It doesn't do anything
programmatically unusual with sockets, it just has a bunch of connections
to a bunch of different machines, which are probably running a bunch of
different os's, etc.  And it aborts when write() returns unexpected
values.

I've appended the actual kernel diffs that I'm using; I'd just pasted them
from an xterm before.  I probably should have added ECONNREFUSED and
ERESTARTSYS to the list.


> Also over what interface do you run it? 

eth0 (tulip):
01:08.0 Ethernet controller: Lite-On Communications Inc LNE100TX (rev 21)


In answer to David's questions, I don't think I'm doing anything out of
the ordinary.  I'm running over DSL with the above card on an external DSL
router.  No netfilter, no tunneling, just IP.

CONFIG_PACKET=m
CONFIG_PACKET_MMAP=y
CONFIG_NETLINK=y
CONFIG_RTNETLINK=y
CONFIG_NETLINK_DEV=m
# CONFIG_NETFILTER is not set
# CONFIG_FILTER is not set
CONFIG_UNIX=y
CONFIG_INET=y
# CONFIG_IP_MULTICAST is not set
# CONFIG_IP_ADVANCED_ROUTER is not set
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_ARPD is not set
CONFIG_INET_ECN=y
CONFIG_SYN_COOKIES=y
CONFIG_IPV6=m
# CONFIG_IPV6_EUI64 is not set
# CONFIG_KHTTPD is not set
# CONFIG_ATM is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_DECNET is not set
# CONFIG_BRIDGE is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_LLC is not set
# CONFIG_NET_DIVERT is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_NET_FASTROUTE is not set
# CONFIG_NET_HW_FLOWCONTROL is not set

CONFIG_DUMMY=m
CONFIG_TULIP=y

I have these set, but I haven't used them:
CONFIG_PPP=m
# CONFIG_PPP_MULTILINK is not set
CONFIG_PPP_ASYNC=m
# CONFIG_PPP_SYNC_TTY is not set
CONFIG_PPP_DEFLATE=m
CONFIG_PPP_BSDCOMP=m


eth0  Link encap:Ethernet  HWaddr 00:A0:CC:3E:E6:63  
  inet addr:64.81.146.215  Bcast:64.81.146.255  Mask:255.255.255.0
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:674842 errors:0 dropped:0 overruns:0 frame:0
  TX packets:791977 errors:0 dropped:0 overruns:0 carrier:0
  collisions:490 txqueuelen:100 
  Interrupt:11 Base address:0xd800 

loLink encap:Local Loopback  
  inet addr:127.0.0.1  Mask:255.0.0.0
  UP LOOPBACK RUNNING  MTU:3856  Metric:1
  RX packets:1138 errors:0 dropped:0 overruns:0 frame:0
  TX packets:1138 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0 

Destination Gateway Genmask Flags Metric RefUse Iface
64.81.146.0 0.0.0.0 255.255.255.0   U 0  00 eth0
0.0.0.0 64.81.146.1 0.0.0.0 UG0  00 eth0


Is there a list somewhere of network options I should report (such as
filtering and tunneling)?

Should one of linux-kernel or linux-net be pruned from the Cc: list?

-- 
Paul Cassella


--- fs/read_write.c~Sat Nov 11 18:07:38 2000
+++ fs/read_write.c Sun Jan  7 14:00:25 2001
@@ -146,6 +146,8 @@
ssize_t ret;
struct file * file;
 
+extern struct file_operations socket_file_ops;
+
ret = -EBADF;
file = fget(fd);
if (file) {
@@ -165,6 +167,18 @@
DN_MODIFY);
fput(file);
}
+   if(ret < 0 && file && file->f_op == _file_ops ) {
+ switch(-ret) {
+   case EAGAIN: case EBADF: case EPIPE: case ENOSPC: case EIO: case 
+ECONNRESET:
+   case EINTR: case ETIMEDOUT: case EFAULT: case EINVAL: case EMSGSIZE: 
+case ENOMEM:
+   case ENOBUFS: case ENOTCONN: 
+ break;
+
+   default:
+   printk(KERN_ERR "sys_write: ret is unexpectedly %d.\n", ret);
+ }
+   }
+
return ret;
 }
 
--- net/socket.c~   Mon Jan  8 01:26:55 2001
+++ net/socket.cSun Jan  7 13:58:55 2001
@@ -111,7 +111,7 @@
  * in the operation structures but are done directly via the socketcall() 
multiplexor.
  */
 
-static struct file_operations socket_file_ops = {
+struct file_operations socket_file_ops = {
 llseek:sock_lseek,
 read:  sock_read,
 write: sock_write,

-
To unsubscrib

Re: 2.4.0-ac3 write() to tcp socket returning errno of -3 (ESRCH:"No such process")

2001-01-07 Thread Paul Cassella

On Sun, 7 Jan 2001, David S. Miller wrote:

>Date:  Sun, 7 Jan 2001 23:55:28 -0600 (CST)
>From: Paul Cassella <[EMAIL PROTECTED]>
> 
>[1.] One line summary of the problem:
> 
>write() returns -1 and sets errno non-sensically.  2.4.0{,-ac[23]}
> 
> What you describe I can only say is "impossible".

Right.  That's why the code g_error()'s there.  :)

> None of them can occur via TCP socket writes (only netlink socket
> operations or socket control calls).

I didn't imagine that a TCP write could return any of these errors, which
is why I guessed that something's returning a value it thinks is positive,
but turns out to be negative in this case, for example, "return a - b;"
where due to, maybe, (but I guess not,) the peer shrunk window thing, a < b.

> Therefore I suspect you are perhaps getting rather some form of memory
> corruption or similar, really, please search the networking code for
> ESRCH value usage, you will see.

I believe that the tcp_sendmsg() path never tries to return -ESRCH, but I
don't see how userland memory corruption could cause the app to (1) open
such a socket instead of a TCP socket, and (2) get ESRCH or ENOEXEC
instead of EPERM or whatever a non-root process would get when it tried
such a thing.

I have seen no signs of bad memory on this machine, and of a fairly small
sample set, one (I was wrong when I said several) person is seeing the
same thing, with 2.4.0-test13-pre4, and I didn't see it with
2.4.0-test11-pre2.

The change I made to linux/fs/read_write.c:sys_write() checks

  file->f_op == _file_ops

and then finds ret == -3; presumably this is the same 3 that my app gets.


Would it be more helpful if I were to check something like

  socki_lookup(file->f_dentry->f_inode)->ops == tcp_prot

instead?  Or do the check in tcp_sendmsg()?

-- 
Paul Cassella

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0-ac3 write() to tcp socket returning errno of -3 (ESRCH: "Nosuch process")

2001-01-07 Thread Paul Cassella

[1.] One line summary of the problem:

write() returns -1 and sets errno non-sensically.  2.4.0{,-ac[23]}

[2.] Full description of the problem/report:

write() sometimes returns -1 and sets errno to one of
1 (EPERM: "Operation not permitted")
3 (ESRCH: "No such process")
8 (ENOEXEC: "Exec format error")

1 seems to be the most common.

I have verified that at least the ESRCH case is caused by the kernel.  
With the following patch to sys_write(),

--- fs/read_write.c~   Sat Nov 11 18:07:38 2000
+++ fs/read_write.cSun Jan  7 14:00:25 2001
@@ -146,6 +146,8 @@
ssize_t ret;
struct file * file;
 
+extern struct file_operations socket_file_ops;
+
ret = -EBADF;
file = fget(fd);
if (file) {
@@ -165,6 +167,18 @@
 DN_MODIFY);
fput(file);
}
+   if(ret < 0 && file && file->f_op == _file_ops ) {
+ switch(-ret) {
+   case EAGAIN: case EBADF: case EPIPE: case ENOSPC: case EIO: case 
+ECONNRESET:
+   case EINTR: case ETIMEDOUT: case EFAULT: case EINVAL: case EMSGSIZE: 
+case ENOMEM:
+   case ENOBUFS: case ENOTCONN: 
+ break;
+
+   default:
+   printk(KERN_ERR "sys_write: ret is unexpectedly %d.\n", ret);
+ }
+   }
+
return ret;
 }

and socket_file_ops made non-static, I eventually got this logged:

Jan  7 22:15:29 localhost kernel: sys_write: ret is unexpectedly -3.


And this user code:

r = write(fd_that_is_a_tcp_socket, ...); 

if (r > 0) { ...
} else if(r == 0) { ...
} else if (errno == EAGAIN || errno == EINTR) { ...
} else if (errno == EPIPE || errno == ENOSPC || errno == EIO || errno == ECONNRESET || 
errno == ETIMEDOUT) { ...
} else {
  g_error("node_write: write failed with unexpected errno: %d (%s)\n", errno, 
g_strerror(errno));
}

at the same time produced

** ERROR **: node_write: write failed with unexpected errno: 3 (No such process)

aborting...

This socket is O_NONBLOCK'd, as well as SO_KEEPALIVE'd and SO_REUSEADDR'd,
and was an outgoing connection.

TCP ECN is on.  Syncookies are compiled in but turned off.

This app has previously failed with the same thing for errno's 1 and 8,
but without the kernel change to verify that those values are actually
coming from the kernel.

That had happened on 2.4.0, 2.4.0-ac2, and 2.4.0-ac3.  I didn't notice it
on 2.4.0-test11-pre2, which was the last kernel I'd been running.  But
the problem's not particularly deterministic, and I may not have been
running it long enough.  At least two other people are seeing the same
thing on 2.4 kernels.

My guess is that something mistakenly thinks it's returning a positive
number.

[3.] Keywords (i.e., modules, networking, kernel):

networking


[4.] Kernel version (from /proc/version):

Linux version 2.4.0-ac3 (fortytwo@manetheren) (gcc version 2.95.2 2220 (Debian 
GNU/Linux)) #4 Sun Jan 7 16:18:26 CST 2001

The machine and kernel are UP.


[6.] A small shell script or example program which triggers the
 problem (if possible)

This happens after the app runs usually for several hours with lots of
outgoing and incoming TCP connections to lots of different hosts.


[7.1.] Software (add the output of the ver_linux script here)

Linux manetheren 2.4.0-ac3 #4 Sun Jan 7 16:18:26 CST 2001 i686 unknown
Kernel modules 2.3.23
Gnu C  2.95.2
Gnu Make   3.79.1
Binutils   2.10.1.0.2
Linux C Library> libc.2.2
Dynamic linker ldd (GNU libc) 2.2
Procps 2.0.6
Mount  2.10q
Net-tools  2.05
Console-tools  0.2.3
Sh-utils   2.0.11
Modules Loaded parport_pc lp parport rtc usbcore


[7.2.] Processor information (from /proc/cpuinfo):
[7.3.] Module information (from /proc/modules):
[7.5.] PCI information ('lspci -vvv' as root)
[7.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)

These doesn't seem particularly relevant, so I'm putting them, as well as
.config, at

 http://manetheren.eigenray.com/~fortytwo/bad_write_info

Don't hesitate to ask me for more info or to try a patch or something.


[7.7.] Other information that might be relevant to the problem
   (please look in /proc and include all information that you
   think to be relevant):

In 2.4.0 and -ac2, I was getting the reset_xmit_timer() messages that -ac3
fixed.  I'm also getting TCP peer shrinks window messages, but had none
this boot before this error.

-- 
Paul Cassella

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0-ac3 write() to tcp socket returning errno of -3 (ESRCH: Nosuch process)

2001-01-07 Thread Paul Cassella

[1.] One line summary of the problem:

write() returns -1 and sets errno non-sensically.  2.4.0{,-ac[23]}

[2.] Full description of the problem/report:

write() sometimes returns -1 and sets errno to one of
1 (EPERM: "Operation not permitted")
3 (ESRCH: "No such process")
8 (ENOEXEC: "Exec format error")

1 seems to be the most common.

I have verified that at least the ESRCH case is caused by the kernel.  
With the following patch to sys_write(),

--- fs/read_write.c~   Sat Nov 11 18:07:38 2000
+++ fs/read_write.cSun Jan  7 14:00:25 2001
@@ -146,6 +146,8 @@
ssize_t ret;
struct file * file;
 
+extern struct file_operations socket_file_ops;
+
ret = -EBADF;
file = fget(fd);
if (file) {
@@ -165,6 +167,18 @@
 DN_MODIFY);
fput(file);
}
+   if(ret  0  file  file-f_op == socket_file_ops ) {
+ switch(-ret) {
+   case EAGAIN: case EBADF: case EPIPE: case ENOSPC: case EIO: case 
+ECONNRESET:
+   case EINTR: case ETIMEDOUT: case EFAULT: case EINVAL: case EMSGSIZE: 
+case ENOMEM:
+   case ENOBUFS: case ENOTCONN: 
+ break;
+
+   default:
+   printk(KERN_ERR "sys_write: ret is unexpectedly %d.\n", ret);
+ }
+   }
+
return ret;
 }

and socket_file_ops made non-static, I eventually got this logged:

Jan  7 22:15:29 localhost kernel: sys_write: ret is unexpectedly -3.


And this user code:

r = write(fd_that_is_a_tcp_socket, ...); 

if (r  0) { ...
} else if(r == 0) { ...
} else if (errno == EAGAIN || errno == EINTR) { ...
} else if (errno == EPIPE || errno == ENOSPC || errno == EIO || errno == ECONNRESET || 
errno == ETIMEDOUT) { ...
} else {
  g_error("node_write: write failed with unexpected errno: %d (%s)\n", errno, 
g_strerror(errno));
}

at the same time produced

** ERROR **: node_write: write failed with unexpected errno: 3 (No such process)

aborting...

This socket is O_NONBLOCK'd, as well as SO_KEEPALIVE'd and SO_REUSEADDR'd,
and was an outgoing connection.

TCP ECN is on.  Syncookies are compiled in but turned off.

This app has previously failed with the same thing for errno's 1 and 8,
but without the kernel change to verify that those values are actually
coming from the kernel.

That had happened on 2.4.0, 2.4.0-ac2, and 2.4.0-ac3.  I didn't notice it
on 2.4.0-test11-pre2, which was the last kernel I'd been running.  But
the problem's not particularly deterministic, and I may not have been
running it long enough.  At least two other people are seeing the same
thing on 2.4 kernels.

My guess is that something mistakenly thinks it's returning a positive
number.

[3.] Keywords (i.e., modules, networking, kernel):

networking


[4.] Kernel version (from /proc/version):

Linux version 2.4.0-ac3 (fortytwo@manetheren) (gcc version 2.95.2 2220 (Debian 
GNU/Linux)) #4 Sun Jan 7 16:18:26 CST 2001

The machine and kernel are UP.


[6.] A small shell script or example program which triggers the
 problem (if possible)

This happens after the app runs usually for several hours with lots of
outgoing and incoming TCP connections to lots of different hosts.


[7.1.] Software (add the output of the ver_linux script here)

Linux manetheren 2.4.0-ac3 #4 Sun Jan 7 16:18:26 CST 2001 i686 unknown
Kernel modules 2.3.23
Gnu C  2.95.2
Gnu Make   3.79.1
Binutils   2.10.1.0.2
Linux C Library libc.2.2
Dynamic linker ldd (GNU libc) 2.2
Procps 2.0.6
Mount  2.10q
Net-tools  2.05
Console-tools  0.2.3
Sh-utils   2.0.11
Modules Loaded parport_pc lp parport rtc usbcore


[7.2.] Processor information (from /proc/cpuinfo):
[7.3.] Module information (from /proc/modules):
[7.5.] PCI information ('lspci -vvv' as root)
[7.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)

These doesn't seem particularly relevant, so I'm putting them, as well as
.config, at

 http://manetheren.eigenray.com/~fortytwo/bad_write_info

Don't hesitate to ask me for more info or to try a patch or something.


[7.7.] Other information that might be relevant to the problem
   (please look in /proc and include all information that you
   think to be relevant):

In 2.4.0 and -ac2, I was getting the reset_xmit_timer() messages that -ac3
fixed.  I'm also getting TCP peer shrinks window messages, but had none
this boot before this error.

-- 
Paul Cassella

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-ac3 write() to tcp socket returning errno of -3 (ESRCH:No such process)

2001-01-07 Thread Paul Cassella

On Sun, 7 Jan 2001, David S. Miller wrote:

Date:  Sun, 7 Jan 2001 23:55:28 -0600 (CST)
From: Paul Cassella [EMAIL PROTECTED]
 
[1.] One line summary of the problem:
 
write() returns -1 and sets errno non-sensically.  2.4.0{,-ac[23]}
 
 What you describe I can only say is "impossible".

Right.  That's why the code g_error()'s there.  :)

 None of them can occur via TCP socket writes (only netlink socket
 operations or socket control calls).

I didn't imagine that a TCP write could return any of these errors, which
is why I guessed that something's returning a value it thinks is positive,
but turns out to be negative in this case, for example, "return a - b;"
where due to, maybe, (but I guess not,) the peer shrunk window thing, a  b.

 Therefore I suspect you are perhaps getting rather some form of memory
 corruption or similar, really, please search the networking code for
 ESRCH value usage, you will see.

I believe that the tcp_sendmsg() path never tries to return -ESRCH, but I
don't see how userland memory corruption could cause the app to (1) open
such a socket instead of a TCP socket, and (2) get ESRCH or ENOEXEC
instead of EPERM or whatever a non-root process would get when it tried
such a thing.

I have seen no signs of bad memory on this machine, and of a fairly small
sample set, one (I was wrong when I said several) person is seeing the
same thing, with 2.4.0-test13-pre4, and I didn't see it with
2.4.0-test11-pre2.

The change I made to linux/fs/read_write.c:sys_write() checks

  file-f_op == socket_file_ops

and then finds ret == -3; presumably this is the same 3 that my app gets.


Would it be more helpful if I were to check something like

  socki_lookup(file-f_dentry-f_inode)-ops == tcp_prot

instead?  Or do the check in tcp_sendmsg()?

-- 
Paul Cassella

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-ac3 write() to tcp socket returning errno of -3 (ESRCH:No such process)

2001-01-07 Thread Paul Cassella

On Mon, 8 Jan 2001, Andi Kleen wrote:

 On Sun, Jan 07, 2001 at 11:55:28PM -0600, Paul Cassella wrote:

  write() returns -1 and sets errno non-sensically.  2.4.0{,-ac[23]}

 Would it be possible to provide a compiling test case that shows these
 errors ? 

The CVS version (perhaps even version 0.12) of gtk-gnutella should do it,
though it's not exactly what I'm running.  After the crash I posted about,
I've been running it for another three and a half hours under similar
load, and it hasn't crashed again yet; nor have I seen any other
unexpected ret's (besides those noted below) get logged.

http://gtk-gnutella.sourceforge.net/cvs/

Though I don't think gtk-gnutella is special.  It doesn't do anything
programmatically unusual with sockets, it just has a bunch of connections
to a bunch of different machines, which are probably running a bunch of
different os's, etc.  And it aborts when write() returns unexpected
values.

I've appended the actual kernel diffs that I'm using; I'd just pasted them
from an xterm before.  I probably should have added ECONNREFUSED and
ERESTARTSYS to the list.


 Also over what interface do you run it? 

eth0 (tulip):
01:08.0 Ethernet controller: Lite-On Communications Inc LNE100TX (rev 21)


In answer to David's questions, I don't think I'm doing anything out of
the ordinary.  I'm running over DSL with the above card on an external DSL
router.  No netfilter, no tunneling, just IP.

CONFIG_PACKET=m
CONFIG_PACKET_MMAP=y
CONFIG_NETLINK=y
CONFIG_RTNETLINK=y
CONFIG_NETLINK_DEV=m
# CONFIG_NETFILTER is not set
# CONFIG_FILTER is not set
CONFIG_UNIX=y
CONFIG_INET=y
# CONFIG_IP_MULTICAST is not set
# CONFIG_IP_ADVANCED_ROUTER is not set
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_ARPD is not set
CONFIG_INET_ECN=y
CONFIG_SYN_COOKIES=y
CONFIG_IPV6=m
# CONFIG_IPV6_EUI64 is not set
# CONFIG_KHTTPD is not set
# CONFIG_ATM is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_DECNET is not set
# CONFIG_BRIDGE is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_LLC is not set
# CONFIG_NET_DIVERT is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_NET_FASTROUTE is not set
# CONFIG_NET_HW_FLOWCONTROL is not set

CONFIG_DUMMY=m
CONFIG_TULIP=y

I have these set, but I haven't used them:
CONFIG_PPP=m
# CONFIG_PPP_MULTILINK is not set
CONFIG_PPP_ASYNC=m
# CONFIG_PPP_SYNC_TTY is not set
CONFIG_PPP_DEFLATE=m
CONFIG_PPP_BSDCOMP=m


eth0  Link encap:Ethernet  HWaddr 00:A0:CC:3E:E6:63  
  inet addr:64.81.146.215  Bcast:64.81.146.255  Mask:255.255.255.0
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:674842 errors:0 dropped:0 overruns:0 frame:0
  TX packets:791977 errors:0 dropped:0 overruns:0 carrier:0
  collisions:490 txqueuelen:100 
  Interrupt:11 Base address:0xd800 

loLink encap:Local Loopback  
  inet addr:127.0.0.1  Mask:255.0.0.0
  UP LOOPBACK RUNNING  MTU:3856  Metric:1
  RX packets:1138 errors:0 dropped:0 overruns:0 frame:0
  TX packets:1138 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0 

Destination Gateway Genmask Flags Metric RefUse Iface
64.81.146.0 0.0.0.0 255.255.255.0   U 0  00 eth0
0.0.0.0 64.81.146.1 0.0.0.0 UG0  00 eth0


Is there a list somewhere of network options I should report (such as
filtering and tunneling)?

Should one of linux-kernel or linux-net be pruned from the Cc: list?

-- 
Paul Cassella


--- fs/read_write.c~Sat Nov 11 18:07:38 2000
+++ fs/read_write.c Sun Jan  7 14:00:25 2001
@@ -146,6 +146,8 @@
ssize_t ret;
struct file * file;
 
+extern struct file_operations socket_file_ops;
+
ret = -EBADF;
file = fget(fd);
if (file) {
@@ -165,6 +167,18 @@
DN_MODIFY);
fput(file);
}
+   if(ret  0  file  file-f_op == socket_file_ops ) {
+ switch(-ret) {
+   case EAGAIN: case EBADF: case EPIPE: case ENOSPC: case EIO: case 
+ECONNRESET:
+   case EINTR: case ETIMEDOUT: case EFAULT: case EINVAL: case EMSGSIZE: 
+case ENOMEM:
+   case ENOBUFS: case ENOTCONN: 
+ break;
+
+   default:
+   printk(KERN_ERR "sys_write: ret is unexpectedly %d.\n", ret);
+ }
+   }
+
return ret;
 }
 
--- net/socket.c~   Mon Jan  8 01:26:55 2001
+++ net/socket.cSun Jan  7 13:58:55 2001
@@ -111,7 +111,7 @@
  * in the operation structures but are done directly via the socketcall() 
multiplexor.
  */
 
-static struct file_operations socket_file_ops = {
+struct file_operations socket_file_ops = {
 llseek:sock_lseek,
 read:  sock_read,
 write: sock_write,

-
To unsubscribe from this list: send the line "unsubscri

Re: [RFC] Semaphores used for daemon wakeup

2000-12-21 Thread Paul Cassella

On Fri, 22 Dec 2000, Daniel Phillips wrote:

> But isn't this actually a simple situation?  How about:

I had only adapted that example because it had already been posted showing
one way to do it, and so provided something to compare the sv approach to.

> dmabuf_alloc(...)
> {
> while (1) {
> spin_lock(_lock);
> attempt to grab a free buffer;
> spin_unlock(_lock);
> if (success)
>return;
> down(_wait);
> }
> }

> dmabuf_free(...)
> {
> spin_lock(_lock);
> free up buffer;
> spin_unlock(_lock);
> up(_wait);
> }

This does things a little differently than the way the original did it.  
I thought the original implied that dmabuf_free() might free up multiple
buffers.  There's no indication in the comments that this is the case, but
the original, by using vall_sema(), wakes up all dmabuf_alloc()'s that had
gone to sleep.

The example wasn't meant to be an ideal use of sv's, but merely as an
example of how they could be used to achieve the same behavior as the code
that was posted.

--
Paul Cassella
[EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] Semaphores used for daemon wakeup

2000-12-21 Thread Paul Cassella

The mechanism being developed here seems a lot like synchronization
variables (aka condition variables), which are a part of the "monitor"
synchronization construct.  There is a simple implementation of them in
the xfs patch.  I've been working on a more general version in order to
aid porting some other stuff, which I have appended to this post.

I had been holding off on posting about it since I didn't have any code
that used it ready, and probably wouldn't until 2.5 anyway, but due to
this thread, I think bringing it up now might be helpful.


Daniel Phillips wrote:
> Tim Wright wrote:


> > p_sema_v_lock(, priority, );  /* atomically release the lock AND */
> > /* go to sleep on the semaphore */

This in particular looks like sv_wait(), which atomically releases a
lock and goes to sleep.

The style is somewhat different, as a sync variable (sv) is not really a
lock, but is still something that a process can block on.  A process goes
to sleep by calling sv_wait(sv), and is woken up by another process
calling sv_signal(sv) (wake one) or sv_broadcast(sv) (wake all).  If there
is no process sleeping in sv_wait, signals and broadcasts have no effect;
they do not affect any process which subsequently calls sv_wait(). 

Each sync variable is associated with another lock, which provides mutual
exclusion guarantees.  This lock must be held to call sv_wait(), and is
dropped atomically with the process going to sleep.  This lock must also
be held to call sv_signal() or sv_broadcast().  Currently, this lock can
be a spinlock or a semaphore; other types of locks could be added if
necessary. 

The sync variable version of the dmabuf code snippet (assuming the
dmabuf_mutex is never acquired from an interrupt) would look like this: 


dmabuf_init(...);
{
...
spin_lock_init(_spin);
sv_init(_sv, _spin, SV_MON_SPIN);
...
}

dmabuf_alloc(...)
{

...
while (1) {
spin_lock(_spin);
attempt to grab a free buffer;
if (success){
spin_unlock(_spin);
return;
} else {
sv_wait(_sv);
}
}
}

dmabuf_free(...)
{
...
spin_lock(_spin);
free up buffer;
sv_broadcast(_sv);
spin_unlock(_spin);
}

If dmabuf_free() could be called from an interrupt, this would be modified
by passing the SV_INTS flag to sv_init(), using spin_lock_irq() in
dmabuf_alloc, and spin_lock_irqsave/restore in dmabuf_free().

> > As you can see, the spinlocks ensure no races, and the key is the atomicity
> > of p_sema_v_lock(). No-one can race in and sleep on dmabuf_wait, because
> > they have to hold dmabuf_mutex to do so. Exactly the same mechanism would
> > work for the bdflush problem.

The same protections are in place here, as the two methods are basically
the same. 


> described.  The main difference between this situation and bdflush is
> that dmabuf_free isn't really waiting on dmabuf_alloc to fullfill a
> condition (other than to get out of its exclusion region) while bdflush
> can have n waiters.

This could be done with two sv's.  After all, there are two conditions:
"someone needs bdflush to run", and "bdflush is done". 


> int atomic_read_and_clear(atomic_t *p)
> {
> int n = atomic_read(p);
> atomic_sub(p, n);
> return n;
> }

I don't think this will work; consider two callers doing the atomic_read() 
at the same time, or someone else doing an atomic_dec() after the
atomic_read(). 


> the more involved wake_up_process().  What's clear is, they are all
> plenty fast enough for this application, and what I'm really trying for
> is readability.

I think sv's are pretty readable and clear, but I'd like to find out what
other people think.

If anyone finds this useful or finds any bugs, or has any questions or
suggestions, please let me know.  I'll be reading the list, but I'm going
on vacation tomorrow, so I'd appreciate a Cc: so I have a better chance of
answering before then.  Thanks.


-- 
Paul Cassella
[EMAIL PROTECTED]


And now, the code.

include/linux/sv.h:

/*
This is the version I'm using with a test8-pre1 kernel, so it uses the
old TASK_EXCLUSIVE semantics; it should be trivial to make it work with
new kernels.  I haven't yet done so, since we're going to be using the
test8-pre1 kernel for a few more weeks yet. 

In the interests of brevity, I've taken out most of the debugging code,
some typecasting, and some other things like the wrapper functions for
up() and spin_unlock(), which are needed because we need a function
pointer, and these may be macros or inline fuctions. 

This was originally based on the version Steve Lord did for t

Re: [RFC] Semaphores used for daemon wakeup

2000-12-21 Thread Paul Cassella

The mechanism being developed here seems a lot like synchronization
variables (aka condition variables), which are a part of the "monitor"
synchronization construct.  There is a simple implementation of them in
the xfs patch.  I've been working on a more general version in order to
aid porting some other stuff, which I have appended to this post.

I had been holding off on posting about it since I didn't have any code
that used it ready, and probably wouldn't until 2.5 anyway, but due to
this thread, I think bringing it up now might be helpful.


Daniel Phillips wrote:
 Tim Wright wrote:


  p_sema_v_lock(sema, priority, lock);  /* atomically release the lock AND */
  /* go to sleep on the semaphore */

This in particular looks like sv_wait(), which atomically releases a
lock and goes to sleep.

The style is somewhat different, as a sync variable (sv) is not really a
lock, but is still something that a process can block on.  A process goes
to sleep by calling sv_wait(sv), and is woken up by another process
calling sv_signal(sv) (wake one) or sv_broadcast(sv) (wake all).  If there
is no process sleeping in sv_wait, signals and broadcasts have no effect;
they do not affect any process which subsequently calls sv_wait(). 

Each sync variable is associated with another lock, which provides mutual
exclusion guarantees.  This lock must be held to call sv_wait(), and is
dropped atomically with the process going to sleep.  This lock must also
be held to call sv_signal() or sv_broadcast().  Currently, this lock can
be a spinlock or a semaphore; other types of locks could be added if
necessary. 

The sync variable version of the dmabuf code snippet (assuming the
dmabuf_mutex is never acquired from an interrupt) would look like this: 


dmabuf_init(...);
{
...
spin_lock_init(dmabuf_spin);
sv_init(dmabuf_sv, dmabuf_spin, SV_MON_SPIN);
...
}

dmabuf_alloc(...)
{

...
while (1) {
spin_lock(dmabuf_spin);
attempt to grab a free buffer;
if (success){
spin_unlock(dmabuf_spin);
return;
} else {
sv_wait(dmabuf_sv);
}
}
}

dmabuf_free(...)
{
...
spin_lock(dmabuf_spin);
free up buffer;
sv_broadcast(dmabuf_sv);
spin_unlock(dmabuf_spin);
}

If dmabuf_free() could be called from an interrupt, this would be modified
by passing the SV_INTS flag to sv_init(), using spin_lock_irq() in
dmabuf_alloc, and spin_lock_irqsave/restore in dmabuf_free().

  As you can see, the spinlocks ensure no races, and the key is the atomicity
  of p_sema_v_lock(). No-one can race in and sleep on dmabuf_wait, because
  they have to hold dmabuf_mutex to do so. Exactly the same mechanism would
  work for the bdflush problem.

The same protections are in place here, as the two methods are basically
the same. 


 described.  The main difference between this situation and bdflush is
 that dmabuf_free isn't really waiting on dmabuf_alloc to fullfill a
 condition (other than to get out of its exclusion region) while bdflush
 can have n waiters.

This could be done with two sv's.  After all, there are two conditions:
"someone needs bdflush to run", and "bdflush is done". 


 int atomic_read_and_clear(atomic_t *p)
 {
 int n = atomic_read(p);
 atomic_sub(p, n);
 return n;
 }

I don't think this will work; consider two callers doing the atomic_read() 
at the same time, or someone else doing an atomic_dec() after the
atomic_read(). 


 the more involved wake_up_process().  What's clear is, they are all
 plenty fast enough for this application, and what I'm really trying for
 is readability.

I think sv's are pretty readable and clear, but I'd like to find out what
other people think.

If anyone finds this useful or finds any bugs, or has any questions or
suggestions, please let me know.  I'll be reading the list, but I'm going
on vacation tomorrow, so I'd appreciate a Cc: so I have a better chance of
answering before then.  Thanks.


-- 
Paul Cassella
[EMAIL PROTECTED]


And now, the code.

include/linux/sv.h:

/*
This is the version I'm using with a test8-pre1 kernel, so it uses the
old TASK_EXCLUSIVE semantics; it should be trivial to make it work with
new kernels.  I haven't yet done so, since we're going to be using the
test8-pre1 kernel for a few more weeks yet. 

In the interests of brevity, I've taken out most of the debugging code,
some typecasting, and some other things like the wrapper functions for
up() and spin_unlock(), which are needed because we need a function
pointer, and these may be macros or inline fuctions. 

This was originally based on the version Steve Lord did for the xfs port. 
This version had no probl

Re: [RFC] Semaphores used for daemon wakeup

2000-12-21 Thread Paul Cassella

On Fri, 22 Dec 2000, Daniel Phillips wrote:

 But isn't this actually a simple situation?  How about:

I had only adapted that example because it had already been posted showing
one way to do it, and so provided something to compare the sv approach to.

 dmabuf_alloc(...)
 {
 while (1) {
 spin_lock(dmabuf_lock);
 attempt to grab a free buffer;
 spin_unlock(dmabuf_lock);
 if (success)
return;
 down(dmabuf_wait);
 }
 }

 dmabuf_free(...)
 {
 spin_lock(dmabuf_lock);
 free up buffer;
 spin_unlock(dmabuf_lock);
 up(dmabuf_wait);
 }

This does things a little differently than the way the original did it.  
I thought the original implied that dmabuf_free() might free up multiple
buffers.  There's no indication in the comments that this is the case, but
the original, by using vall_sema(), wakes up all dmabuf_alloc()'s that had
gone to sleep.

The example wasn't meant to be an ideal use of sv's, but merely as an
example of how they could be used to achieve the same behavior as the code
that was posted.

--
Paul Cassella
[EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/