Re: 2.4.4: Kernel crash, possibly tcp related

2001-05-01 Thread David S. Miller


[EMAIL PROTECTED] writes:
 > > See?
 > 
 > I see! Dave, please, take the second Andrea's patch (appended).
 > It is really the cleanest one.

Thanks a lot Andrea and Alexey.  I've applied the patch.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-05-01 Thread kuznet

Hello!

> If send_head doesn't point to skb then it is before it (and it cannot
> advance under us of course because we hold the sock lock) and so in such
> case we didn't clobbered the send_head at all in skb_entail, and so we
> don't need to touch send_head in order to undo (we only need to unlink).
> 
> See?

I see! Dave, please, take the second Andrea's patch (appended).
It is really the cleanest one.

Alexey


--- 2.4.4aa3/net/ipv4/tcp.c.~1~ Tue May  1 10:44:57 2001
+++ 2.4.4aa3/net/ipv4/tcp.c Tue May  1 12:00:25 2001
@@ -1183,11 +1183,8 @@
 
 do_fault:
if (skb->len==0) {
-   if (tp->send_head == skb) {
-   tp->send_head = skb->next;
-   if (tp->send_head == (struct sk_buff*)>write_queue)
-   tp->send_head = NULL;
-   }
+   if (tp->send_head == skb)
+   tp->send_head = NULL;
__skb_unlink(skb, skb->list);
tcp_free_skb(sk, skb);
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-05-01 Thread Andrea Arcangeli

On Tue, May 01, 2001 at 09:25:43PM +0400, [EMAIL PROTECTED] wrote:
> Hello!
> 
> > zero and we are running in such slow path, it is obvious the send_head
> > _was_ NULL when we entered the critical section, so it's perfectly fine
> 
> It is not only not obvious, it is not true almost always.
> On normally working tcp send_head is almost never NULL,
> it is NULL only when application is so slow that is not able
> to saturate pipe. If you do not believe my word, add printk checking this. 8)

Note: I said: ".. if send_head points to skb and skb->len is
  ^^
zero and we are running in such slow path ..".

If send_head doesn't point to skb then it is before it (and it cannot
advance under us of course because we hold the sock lock) and so in such
case we didn't clobbered the send_head at all in skb_entail, and so we
don't need to touch send_head in order to undo (we only need to unlink).

See?

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-05-01 Thread kuznet

Hello!

> zero and we are running in such slow path, it is obvious the send_head
> _was_ NULL when we entered the critical section, so it's perfectly fine

It is not only not obvious, it is not true almost always.
On normally working tcp send_head is almost never NULL,
it is NULL only when application is so slow that is not able
to saturate pipe. If you do not believe my word, add printk checking this. 8)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-05-01 Thread Andrea Arcangeli

On Tue, May 01, 2001 at 08:44:52PM +0400, [EMAIL PROTECTED] wrote:
> Hello!
> 
> > this is the strict fix:
> 
> Andrea, you caught the problem!
> 
> The fix is not right though (it is equivalent to straight
> tp->send_head=NULL, as you noticed. It also corrupts queue in
> an opposite manner.) Right fix is appended.
> 
> Explanation: in do_fault we must undo effect of enqueueing new segment
> in the case the segment remained empty. tp->send_head points to
> the first unsent skb in queue and it is NULL when and only when
> all the skbs are already sent. (Invariant is: tp->send_head==NULL ||
> tp->send_head->seq == tp->snd_nxt)
> I crapped this case except for the case when queue is completely empty,
> so that the last sent skb was accounted in packets_out twice...

I understsand the explanation but I don't think my patch is wrong, I
think it's simpler and faster instead.

My argument is very simple, if send_head points to skb and skb->len is
zero and we are running in such slow path, it is obvious the send_head
_was_ NULL when we entered the critical section, so it's perfectly fine
to set send_head back to null and to unlink the skb as the only actions
to undo the skb_entail. That's all. I don't see how my patch can fail.
If I'm missing something I'd love a further explanation indeed. Thanks!

> 
> Damn, what a silly mistake was it... shame.
> 
> Alexey
> 
> 
> --- ../vger3-010426/linux/net/ipv4/tcp.c  Wed Apr 25 21:02:18 2001
> +++ linux/net/ipv4/tcp.c  Tue May  1 20:38:44 2001
> @@ -1185,7 +1187,7 @@
>   if (skb->len==0) {
>   if (tp->send_head == skb) {
>   tp->send_head = skb->prev;
> - if (tp->send_head == (struct sk_buff*)>write_queue)
> + if (TCP_SKB_CB(skb)->seq == tp->snd_nxt)
>   tp->send_head = NULL;
>   }
>   __skb_unlink(skb, skb->list);


Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-05-01 Thread kuznet

Hello!

> this is the strict fix:

Andrea, you caught the problem!

The fix is not right though (it is equivalent to straight
tp->send_head=NULL, as you noticed. It also corrupts queue in
an opposite manner.) Right fix is appended.

Explanation: in do_fault we must undo effect of enqueueing new segment
in the case the segment remained empty. tp->send_head points to
the first unsent skb in queue and it is NULL when and only when
all the skbs are already sent. (Invariant is: tp->send_head==NULL ||
tp->send_head->seq == tp->snd_nxt)
I crapped this case except for the case when queue is completely empty,
so that the last sent skb was accounted in packets_out twice...

Damn, what a silly mistake was it... shame.

Alexey


--- ../vger3-010426/linux/net/ipv4/tcp.cWed Apr 25 21:02:18 2001
+++ linux/net/ipv4/tcp.cTue May  1 20:38:44 2001
@@ -1185,7 +1187,7 @@
if (skb->len==0) {
if (tp->send_head == skb) {
tp->send_head = skb->prev;
-   if (tp->send_head == (struct sk_buff*)>write_queue)
+   if (TCP_SKB_CB(skb)->seq == tp->snd_nxt)
tp->send_head = NULL;
}
__skb_unlink(skb, skb->list);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-05-01 Thread Andrea Arcangeli

On Mon, Apr 30, 2001 at 09:00:09PM +0400, [EMAIL PROTECTED] wrote:
> Hello!
> 
> > My current theory is that tcpblast does something erratic when the
> > error occurs.
> 
> It has buffer size of 32K, so that it faults at enough large chunk sizes.
> 
> Erratic errno is because this applet prints errno on partial write.
> 
> Oops is apparently because I did something wrong in do_fault yet.
> Seems, you were right telling that this place looks dubious. 8)

this is the strict fix:

diff -urN z/net/ipv4/tcp.c z1/net/ipv4/tcp.c
--- z/net/ipv4/tcp.cTue May  1 12:14:14 2001
+++ z1/net/ipv4/tcp.c   Tue May  1 12:12:35 2001
@@ -1184,7 +1184,7 @@
 do_fault:
if (skb->len==0) {
if (tp->send_head == skb) {
-   tp->send_head = skb->prev;
+   tp->send_head = skb->next;
if (tp->send_head == (struct sk_buff*)>write_queue)
tp->send_head = NULL;
}


really the logic can be implemented more efficiently this way:

--- 2.4.4aa3/net/ipv4/tcp.c.~1~ Tue May  1 10:44:57 2001
+++ 2.4.4aa3/net/ipv4/tcp.c Tue May  1 12:00:25 2001
@@ -1183,11 +1183,8 @@
 
 do_fault:
if (skb->len==0) {
-   if (tp->send_head == skb) {
-   tp->send_head = skb->next;
-   if (tp->send_head == (struct sk_buff*)>write_queue)
-   tp->send_head = NULL;
-   }
+   if (tp->send_head == skb)
+   tp->send_head = NULL;
__skb_unlink(skb, skb->list);
tcp_free_skb(sk, skb);
}

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-05-01 Thread Andrea Arcangeli

On Mon, Apr 30, 2001 at 09:00:09PM +0400, [EMAIL PROTECTED] wrote:
 Hello!
 
  My current theory is that tcpblast does something erratic when the
  error occurs.
 
 It has buffer size of 32K, so that it faults at enough large chunk sizes.
 
 Erratic errno is because this applet prints errno on partial write.
 
 Oops is apparently because I did something wrong in do_fault yet.
 Seems, you were right telling that this place looks dubious. 8)

this is the strict fix:

diff -urN z/net/ipv4/tcp.c z1/net/ipv4/tcp.c
--- z/net/ipv4/tcp.cTue May  1 12:14:14 2001
+++ z1/net/ipv4/tcp.c   Tue May  1 12:12:35 2001
@@ -1184,7 +1184,7 @@
 do_fault:
if (skb-len==0) {
if (tp-send_head == skb) {
-   tp-send_head = skb-prev;
+   tp-send_head = skb-next;
if (tp-send_head == (struct sk_buff*)sk-write_queue)
tp-send_head = NULL;
}


really the logic can be implemented more efficiently this way:

--- 2.4.4aa3/net/ipv4/tcp.c.~1~ Tue May  1 10:44:57 2001
+++ 2.4.4aa3/net/ipv4/tcp.c Tue May  1 12:00:25 2001
@@ -1183,11 +1183,8 @@
 
 do_fault:
if (skb-len==0) {
-   if (tp-send_head == skb) {
-   tp-send_head = skb-next;
-   if (tp-send_head == (struct sk_buff*)sk-write_queue)
-   tp-send_head = NULL;
-   }
+   if (tp-send_head == skb)
+   tp-send_head = NULL;
__skb_unlink(skb, skb-list);
tcp_free_skb(sk, skb);
}

Andrea
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-05-01 Thread David S. Miller


[EMAIL PROTECTED] writes:
   See?
  
  I see! Dave, please, take the second Andrea's patch (appended).
  It is really the cleanest one.

Thanks a lot Andrea and Alexey.  I've applied the patch.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-05-01 Thread Andrea Arcangeli

On Tue, May 01, 2001 at 09:25:43PM +0400, [EMAIL PROTECTED] wrote:
 Hello!
 
  zero and we are running in such slow path, it is obvious the send_head
  _was_ NULL when we entered the critical section, so it's perfectly fine
 
 It is not only not obvious, it is not true almost always.
 On normally working tcp send_head is almost never NULL,
 it is NULL only when application is so slow that is not able
 to saturate pipe. If you do not believe my word, add printk checking this. 8)

Note: I said: .. if send_head points to skb and skb-len is
  ^^
zero and we are running in such slow path ...

If send_head doesn't point to skb then it is before it (and it cannot
advance under us of course because we hold the sock lock) and so in such
case we didn't clobbered the send_head at all in skb_entail, and so we
don't need to touch send_head in order to undo (we only need to unlink).

See?

Andrea
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-05-01 Thread kuznet

Hello!

 If send_head doesn't point to skb then it is before it (and it cannot
 advance under us of course because we hold the sock lock) and so in such
 case we didn't clobbered the send_head at all in skb_entail, and so we
 don't need to touch send_head in order to undo (we only need to unlink).
 
 See?

I see! Dave, please, take the second Andrea's patch (appended).
It is really the cleanest one.

Alexey


--- 2.4.4aa3/net/ipv4/tcp.c.~1~ Tue May  1 10:44:57 2001
+++ 2.4.4aa3/net/ipv4/tcp.c Tue May  1 12:00:25 2001
@@ -1183,11 +1183,8 @@
 
 do_fault:
if (skb-len==0) {
-   if (tp-send_head == skb) {
-   tp-send_head = skb-next;
-   if (tp-send_head == (struct sk_buff*)sk-write_queue)
-   tp-send_head = NULL;
-   }
+   if (tp-send_head == skb)
+   tp-send_head = NULL;
__skb_unlink(skb, skb-list);
tcp_free_skb(sk, skb);
}

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-05-01 Thread kuznet

Hello!

 this is the strict fix:

Andrea, you caught the problem!

The fix is not right though (it is equivalent to straight
tp-send_head=NULL, as you noticed. It also corrupts queue in
an opposite manner.) Right fix is appended.

Explanation: in do_fault we must undo effect of enqueueing new segment
in the case the segment remained empty. tp-send_head points to
the first unsent skb in queue and it is NULL when and only when
all the skbs are already sent. (Invariant is: tp-send_head==NULL ||
tp-send_head-seq == tp-snd_nxt)
I crapped this case except for the case when queue is completely empty,
so that the last sent skb was accounted in packets_out twice...

Damn, what a silly mistake was it... shame.

Alexey


--- ../vger3-010426/linux/net/ipv4/tcp.cWed Apr 25 21:02:18 2001
+++ linux/net/ipv4/tcp.cTue May  1 20:38:44 2001
@@ -1185,7 +1187,7 @@
if (skb-len==0) {
if (tp-send_head == skb) {
tp-send_head = skb-prev;
-   if (tp-send_head == (struct sk_buff*)sk-write_queue)
+   if (TCP_SKB_CB(skb)-seq == tp-snd_nxt)
tp-send_head = NULL;
}
__skb_unlink(skb, skb-list);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-05-01 Thread Andrea Arcangeli

On Tue, May 01, 2001 at 08:44:52PM +0400, [EMAIL PROTECTED] wrote:
 Hello!
 
  this is the strict fix:
 
 Andrea, you caught the problem!
 
 The fix is not right though (it is equivalent to straight
 tp-send_head=NULL, as you noticed. It also corrupts queue in
 an opposite manner.) Right fix is appended.
 
 Explanation: in do_fault we must undo effect of enqueueing new segment
 in the case the segment remained empty. tp-send_head points to
 the first unsent skb in queue and it is NULL when and only when
 all the skbs are already sent. (Invariant is: tp-send_head==NULL ||
 tp-send_head-seq == tp-snd_nxt)
 I crapped this case except for the case when queue is completely empty,
 so that the last sent skb was accounted in packets_out twice...

I understsand the explanation but I don't think my patch is wrong, I
think it's simpler and faster instead.

My argument is very simple, if send_head points to skb and skb-len is
zero and we are running in such slow path, it is obvious the send_head
_was_ NULL when we entered the critical section, so it's perfectly fine
to set send_head back to null and to unlink the skb as the only actions
to undo the skb_entail. That's all. I don't see how my patch can fail.
If I'm missing something I'd love a further explanation indeed. Thanks!

 
 Damn, what a silly mistake was it... shame.
 
 Alexey
 
 
 --- ../vger3-010426/linux/net/ipv4/tcp.c  Wed Apr 25 21:02:18 2001
 +++ linux/net/ipv4/tcp.c  Tue May  1 20:38:44 2001
 @@ -1185,7 +1187,7 @@
   if (skb-len==0) {
   if (tp-send_head == skb) {
   tp-send_head = skb-prev;
 - if (tp-send_head == (struct sk_buff*)sk-write_queue)
 + if (TCP_SKB_CB(skb)-seq == tp-snd_nxt)
   tp-send_head = NULL;
   }
   __skb_unlink(skb, skb-list);


Andrea
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-05-01 Thread kuznet

Hello!

 zero and we are running in such slow path, it is obvious the send_head
 _was_ NULL when we entered the critical section, so it's perfectly fine

It is not only not obvious, it is not true almost always.
On normally working tcp send_head is almost never NULL,
it is NULL only when application is so slow that is not able
to saturate pipe. If you do not believe my word, add printk checking this. 8)

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-04-30 Thread Ingo Oeser

On Mon, Apr 30, 2001 at 06:46:33PM +0200, Andrea Arcangeli wrote:
> On Sun, Apr 29, 2001 at 11:58:20PM -0700, David S. Miller wrote:
> > Andrew Morton writes:
> >  > "David S. Miller" wrote:
> > Anyways, I just tried to reproduce Ralf's problem on two of my
> > machines.  One was an SMP sparc64 system, and the other was my
> > uniprocessor Athlon.
> > 
> > What kind of machine are you reproducing this on Ralf?  I'm not
> 
> JFYI: I reproduced too on my UP athlon. I run:
> 
>   tcpblast -d0 -s 40481 another_host 9000
> 
> two times and after the second it locked hard. I didn't had any fork
> bomb at the same time but there was an high computing load in the
> background.

I tried sth. else with 2.4.3-ac13, which could be related:

Machine: 1GB RAM, Dual PIII, ServerWorks LE chipset (Asus CUR-DLS board). 
NIC: [Ethernet Pro 100] (rev 08) (driven by eepro100)

0. Run several kernel compiles and the like to fill up caches.
1. copy an complete iso image into /tmp (which is tmpfs)
2. ftp that over 100Mbit network to an machine.

I got a lot of spikes and a message "mm: critical shortage of
bounce buffers", while doing 1.

And I get a LOT of that messages, while doing 2. But I have a lot
of memory in pagecache and only 100MB allocated for other
processes. And I still have swap free (I have 2GB of swap as
recommended).

So could we please check, double check and triple check the
allocations in the net layer?

Another machine of mine needs now 128MB with the new kernel and
will lock up hard otherwise on full saturated 100Mbit network
load[1] with TCP, but needed only 32MB before. sth. has to be
wrong here...

More info on request. 

I have both machines at hand and they are both ready for testing
as long, as my file systems stay repairable by fsck.ext2 ;-)

Both machines are not running X, frame buffers and no fancy multi
media stuff.

Regards

Ingo Oeser

[1] Tested cards: RTL 8139, Intel Etherexpress Pro 100, 3com
   3c509TX, so I guess it's NOT the NIC ;-)
-- 
10.+11.03.2001 - 3. Chemnitzer LinuxTag 
  been there and had much fun   
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-04-30 Thread kuznet

Hello!

> My current theory is that tcpblast does something erratic when the
> error occurs.

It has buffer size of 32K, so that it faults at enough large chunk sizes.

Erratic errno is because this applet prints errno on partial write.

Oops is apparently because I did something wrong in do_fault yet.
Seems, you were right telling that this place looks dubious. 8)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-04-30 Thread Andrea Arcangeli

On Sun, Apr 29, 2001 at 11:58:20PM -0700, David S. Miller wrote:
> 
> Andrew Morton writes:
>  > "David S. Miller" wrote:
>  > > 
>  > > I'm having a devil of a time finding the tcpblast sources on the
>  > > net, can you point me to where I can get them?
>  > 
>  > I seem to have a copy. 
>  > 
>  > http://www.zip.com.au/~akpm/tcpblast-19990504.tar.gz
> 
> Thanks to everyone who pointed me at this and the debian copy :-)
> 
> Anyways, I just tried to reproduce Ralf's problem on two of my
> machines.  One was an SMP sparc64 system, and the other was my
> uniprocessor Athlon.
> 
> What kind of machine are you reproducing this on Ralf?  I'm not

JFYI: I reproduced too on my UP athlon. I run:

tcpblast -d0 -s 40481 another_host 9000

two times and after the second it locked hard. I didn't had any fork
bomb at the same time but there was an high computing load in the
background.

the nic is:

Ethernet controller: Advanced Micro Devices [AMD] 79c970 [PCnet LANCE] (rev 36)

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-04-30 Thread Ralf Nyren




On Sun, 29 Apr 2001, David S. Miller wrote:

[snip]
>
> Anyways, I just tried to reproduce Ralf's problem on two of my
> machines.  One was an SMP sparc64 system, and the other was my
> uniprocessor Athlon.
>
> What kind of machine are you reproducing this on Ralf?  I'm not
> even getting the very strange errors from tcpblast on the command
> line, it is functioning perfectly fine and sending a stream of
> data to the other machine.  Are you doing something weird like
> making the remote machine the local machine in your tcpblast run?
>
> Later,
> David S. Miller
> [EMAIL PROTECTED]
>


Sorry for not including a reference to the software. I used the
tcpblast program from Debian (unstable). It can be found in the
netdiag package:
http://ftp.debian.org/debian/dists/woody/main/source/net/netdiag_0.7.orig.tar.gz

Since this problem seemed a bit hard to reproduce I tested it on another
machine too. It needed some more load, but eventually crashed.
This machine is a PII 400MHz, 128MB, 440BX/ZX, PIIX. 3c905B network card.
For more information like .config, System.map, ver_linux etc see:
http://www.educ.umu.se/~plumbum/kernel/panic_2.4.4_20010430/

Regarding the strange error msg: tcp/udpblast send:: No such file or directory
both the precompiled binary and one compiled from the source produced
this message. Although I noticed that the min blocksize triggering the message
changed from 40481 to 39841. Probably some compiletime feature :)

Making remote machine the local machine... no, I send from my machine
to another. Both with 100Mbps network connections.

Reproduction procedure:
./tcpblast -d0 -s 20 _another_host_ 9000
./forkbomb
wait...

The so called "forkbomb" shouldn't really be necessary, some heavy load
making use of scheduler, memory and swap seems to do the thing.

Hope this information could be helpful.

regards,
/Ralf

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-04-30 Thread David S. Miller


Andrew Morton writes:
 > "David S. Miller" wrote:
 > > 
 > > I'm having a devil of a time finding the tcpblast sources on the
 > > net, can you point me to where I can get them?
 > 
 > I seem to have a copy. 
 > 
 > http://www.zip.com.au/~akpm/tcpblast-19990504.tar.gz

Thanks to everyone who pointed me at this and the debian copy :-)

Anyways, I just tried to reproduce Ralf's problem on two of my
machines.  One was an SMP sparc64 system, and the other was my
uniprocessor Athlon.

What kind of machine are you reproducing this on Ralf?  I'm not
even getting the very strange errors from tcpblast on the command
line, it is functioning perfectly fine and sending a stream of
data to the other machine.  Are you doing something weird like
making the remote machine the local machine in your tcpblast run?

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-04-30 Thread J Sloan

"David S. Miller" schrieb:

> I'm having a devil of a time finding the tcpblast sources on the
> net, can you point me to where I can get them?  The one reference
> I saw to get the original sources was:
>
> ftp://ftp.xlink.net/pub/network/tcpblast.shar.gz
>
> But even that directory no longer exists.

Try ftp://wintermute.toyota.com/pub/utils/tcpblast.tar

cu

jjs

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-04-30 Thread J Sloan

David S. Miller schrieb:

 I'm having a devil of a time finding the tcpblast sources on the
 net, can you point me to where I can get them?  The one reference
 I saw to get the original sources was:

 ftp://ftp.xlink.net/pub/network/tcpblast.shar.gz

 But even that directory no longer exists.

Try ftp://wintermute.toyota.com/pub/utils/tcpblast.tar

cu

jjs

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-04-30 Thread Ralf Nyren




On Sun, 29 Apr 2001, David S. Miller wrote:

[snip]

 Anyways, I just tried to reproduce Ralf's problem on two of my
 machines.  One was an SMP sparc64 system, and the other was my
 uniprocessor Athlon.

 What kind of machine are you reproducing this on Ralf?  I'm not
 even getting the very strange errors from tcpblast on the command
 line, it is functioning perfectly fine and sending a stream of
 data to the other machine.  Are you doing something weird like
 making the remote machine the local machine in your tcpblast run?

 Later,
 David S. Miller
 [EMAIL PROTECTED]



Sorry for not including a reference to the software. I used the
tcpblast program from Debian (unstable). It can be found in the
netdiag package:
http://ftp.debian.org/debian/dists/woody/main/source/net/netdiag_0.7.orig.tar.gz

Since this problem seemed a bit hard to reproduce I tested it on another
machine too. It needed some more load, but eventually crashed.
This machine is a PII 400MHz, 128MB, 440BX/ZX, PIIX. 3c905B network card.
For more information like .config, System.map, ver_linux etc see:
http://www.educ.umu.se/~plumbum/kernel/panic_2.4.4_20010430/

Regarding the strange error msg: tcp/udpblast send:: No such file or directory
both the precompiled binary and one compiled from the source produced
this message. Although I noticed that the min blocksize triggering the message
changed from 40481 to 39841. Probably some compiletime feature :)

Making remote machine the local machine... no, I send from my machine
to another. Both with 100Mbps network connections.

Reproduction procedure:
./tcpblast -d0 -s 20 _another_host_ 9000
./forkbomb
wait...

The so called forkbomb shouldn't really be necessary, some heavy load
making use of scheduler, memory and swap seems to do the thing.

Hope this information could be helpful.

regards,
/Ralf

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-04-30 Thread Andrea Arcangeli

On Sun, Apr 29, 2001 at 11:58:20PM -0700, David S. Miller wrote:
 
 Andrew Morton writes:
   David S. Miller wrote:

I'm having a devil of a time finding the tcpblast sources on the
net, can you point me to where I can get them?
   
   I seem to have a copy. 
   
   http://www.zip.com.au/~akpm/tcpblast-19990504.tar.gz
 
 Thanks to everyone who pointed me at this and the debian copy :-)
 
 Anyways, I just tried to reproduce Ralf's problem on two of my
 machines.  One was an SMP sparc64 system, and the other was my
 uniprocessor Athlon.
 
 What kind of machine are you reproducing this on Ralf?  I'm not

JFYI: I reproduced too on my UP athlon. I run:

tcpblast -d0 -s 40481 another_host 9000

two times and after the second it locked hard. I didn't had any fork
bomb at the same time but there was an high computing load in the
background.

the nic is:

Ethernet controller: Advanced Micro Devices [AMD] 79c970 [PCnet LANCE] (rev 36)

Andrea
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-04-30 Thread Ingo Oeser

On Mon, Apr 30, 2001 at 06:46:33PM +0200, Andrea Arcangeli wrote:
 On Sun, Apr 29, 2001 at 11:58:20PM -0700, David S. Miller wrote:
  Andrew Morton writes:
David S. Miller wrote:
  Anyways, I just tried to reproduce Ralf's problem on two of my
  machines.  One was an SMP sparc64 system, and the other was my
  uniprocessor Athlon.
  
  What kind of machine are you reproducing this on Ralf?  I'm not
 
 JFYI: I reproduced too on my UP athlon. I run:
 
   tcpblast -d0 -s 40481 another_host 9000
 
 two times and after the second it locked hard. I didn't had any fork
 bomb at the same time but there was an high computing load in the
 background.

I tried sth. else with 2.4.3-ac13, which could be related:

Machine: 1GB RAM, Dual PIII, ServerWorks LE chipset (Asus CUR-DLS board). 
NIC: [Ethernet Pro 100] (rev 08) (driven by eepro100)

0. Run several kernel compiles and the like to fill up caches.
1. copy an complete iso image into /tmp (which is tmpfs)
2. ftp that over 100Mbit network to an machine.

I got a lot of spikes and a message mm: critical shortage of
bounce buffers, while doing 1.

And I get a LOT of that messages, while doing 2. But I have a lot
of memory in pagecache and only 100MB allocated for other
processes. And I still have swap free (I have 2GB of swap as
recommended).

So could we please check, double check and triple check the
allocations in the net layer?

Another machine of mine needs now 128MB with the new kernel and
will lock up hard otherwise on full saturated 100Mbit network
load[1] with TCP, but needed only 32MB before. sth. has to be
wrong here...

More info on request. 

I have both machines at hand and they are both ready for testing
as long, as my file systems stay repairable by fsck.ext2 ;-)

Both machines are not running X, frame buffers and no fancy multi
media stuff.

Regards

Ingo Oeser

[1] Tested cards: RTL 8139, Intel Etherexpress Pro 100, 3com
   3c509TX, so I guess it's NOT the NIC ;-)
-- 
10.+11.03.2001 - 3. Chemnitzer LinuxTag http://www.tu-chemnitz.de/linux/tag
  been there and had much fun   
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-04-30 Thread kuznet

Hello!

 My current theory is that tcpblast does something erratic when the
 error occurs.

It has buffer size of 32K, so that it faults at enough large chunk sizes.

Erratic errno is because this applet prints errno on partial write.

Oops is apparently because I did something wrong in do_fault yet.
Seems, you were right telling that this place looks dubious. 8)

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-04-29 Thread David S. Miller


Ralf Nyren writes:
 > The problem appears when this value is set to 40481 or higher. For ex:
 > $ tcpblast -d0 -s 40481 another_host 9000
 ...
 > KERNEL: assertion (!skb_queue_empty(>write_queue)) failed at tcp_timer.c(327):
 > tcp_retransmit_timer
 > Unable to handle kernel NULL pointer dereference...

I'm having a devil of a time finding the tcpblast sources on the
net, can you point me to where I can get them?  The one reference
I saw to get the original sources was:

ftp://ftp.xlink.net/pub/network/tcpblast.shar.gz

But even that directory no longer exists.

The kernel error you see is a gross fatal error, the TCP retransmit
timer has fired yet there are no packets on the transmit queue :-)

My current theory is that tcpblast does something erratic when the
error occurs.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-04-29 Thread David S. Miller


Ralf Nyren writes:
  The problem appears when this value is set to 40481 or higher. For ex:
  $ tcpblast -d0 -s 40481 another_host 9000
 ...
  KERNEL: assertion (!skb_queue_empty(sk-write_queue)) failed at tcp_timer.c(327):
  tcp_retransmit_timer
  Unable to handle kernel NULL pointer dereference...

I'm having a devil of a time finding the tcpblast sources on the
net, can you point me to where I can get them?  The one reference
I saw to get the original sources was:

ftp://ftp.xlink.net/pub/network/tcpblast.shar.gz

But even that directory no longer exists.

The kernel error you see is a gross fatal error, the TCP retransmit
timer has fired yet there are no packets on the transmit queue :-)

My current theory is that tcpblast does something erratic when the
error occurs.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/