Re: Client receives TCP packets but does not ACK

2001-07-01 Thread Nivedita Singhvi

> The bad network behavior was due to shared irqs somehow screwing 
> things up. This explained most but not all of the problems. 

ah, that's why your test pgm succeeded on my systems..
 
> When I last posted I had a reproducible test case which spewed a bunch 
> of packets from a server to a client. The behavior is that the client 
> eventually stops ACKing and so the the connection stalls indefinitely. 
> packet. I added printk statements for each of these conditions in 
> hopes of detecting why the final packet is not acked. I recompiled 
> the kernel, and reran the test. The result was that the packet was 
> being droped in tcp_rcv_established() due to an invalid checksum. I 

Ouch!

In the interests of not having it be so painful to identify the
problem (to this point, i.e. TCP drops due to checksum failures) 
the next time around, I'd like to ask:

- Were you seeing any bad csum error messages in /var/log/messages?
  i.e. or else was it only TCP?

- Was the stats field /proc/net/snmp/Tcp:InErrs
  reflecting those drops?

- What additional logging/stats gathering would have made this
  (silent drops due to checksum failures by TCP) easier to detect?

  My 2c:

  The stat TcpInErrs is updated for most TCP input failures.
  So its not obvious (unless youre real familiar with TCP)
  that there are checksum failures happening. It actually 
  includes only these errors:
- checksum failures
- header len problems
- unexpected SYN's
 
  Is this adequate as a diagnostic, or would adding a breakdown
  counter(s) for checksum (and other) failures be useful? 
  At the moment, there is no logging TCP does on a plain vanilla 
  kernel, you have to recompile the kernel with NETDEBUG in order 
  to see logged checksum failures, at least at the TCP level. 

  It would be nice to have people be able to look at a counter or 
  stat on the fly and tell whether they're having packets silently 
  dropped due to checksum failures (and other issues) without needing 
  to recompile the kernel...
   
Any thoughts?

thanks,
Nivedita

---
I'd appreciate a cc since I'm not subscribed..
[EMAIL PROTECTED]
[EMAIL PROTECTED] 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-07-01 Thread Nivedita Singhvi

 The bad network behavior was due to shared irqs somehow screwing 
 things up. This explained most but not all of the problems. 

ah, that's why your test pgm succeeded on my systems..
 
 When I last posted I had a reproducible test case which spewed a bunch 
 of packets from a server to a client. The behavior is that the client 
 eventually stops ACKing and so the the connection stalls indefinitely. 
 packet. I added printk statements for each of these conditions in 
 hopes of detecting why the final packet is not acked. I recompiled 
 the kernel, and reran the test. The result was that the packet was 
 being droped in tcp_rcv_established() due to an invalid checksum. I 

Ouch!

In the interests of not having it be so painful to identify the
problem (to this point, i.e. TCP drops due to checksum failures) 
the next time around, I'd like to ask:

- Were you seeing any bad csum error messages in /var/log/messages?
  i.e. or else was it only TCP?

- Was the stats field /proc/net/snmp/Tcp:InErrs
  reflecting those drops?

- What additional logging/stats gathering would have made this
  (silent drops due to checksum failures by TCP) easier to detect?

  My 2c:

  The stat TcpInErrs is updated for most TCP input failures.
  So its not obvious (unless youre real familiar with TCP)
  that there are checksum failures happening. It actually 
  includes only these errors:
- checksum failures
- header len problems
- unexpected SYN's
 
  Is this adequate as a diagnostic, or would adding a breakdown
  counter(s) for checksum (and other) failures be useful? 
  At the moment, there is no logging TCP does on a plain vanilla 
  kernel, you have to recompile the kernel with NETDEBUG in order 
  to see logged checksum failures, at least at the TCP level. 

  It would be nice to have people be able to look at a counter or 
  stat on the fly and tell whether they're having packets silently 
  dropped due to checksum failures (and other issues) without needing 
  to recompile the kernel...
   
Any thoughts?

thanks,
Nivedita

---
I'd appreciate a cc since I'm not subscribed..
[EMAIL PROTECTED]
[EMAIL PROTECTED] 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-26 Thread Robert Kleemann

SUMMARY:

The bad network behavior was due to shared irqs somehow screwing
things up.  This explained most but not all of the problems.

DETAILS:

Many people emailed me that they were experiencing similar problems.
Even though the cause of my problem is not kernel related, I'm hoping
my narrative and eventual solution will helps some folks.  I also
still think this behavior is really weird so those of you with an
abundance of brains and curiosity might want to take a guess at
explaining the behavior that I'm seeing.

When I last posted I had a reproducible test case which spewed a bunch
of packets from a server to a client.  The behavior is that the client
eventually stops ACKing and so the the connection stalls indefinitely.
I spent some time studying the kernel networking code and traced the
code path taken by a tcp packet:

linux/net/core/dev.c:netif_rx() // packet received by eth card
linux/net/ipv4/ip_input.c:ip_rcv()
linux/net/ipv4/ip_input.c:ip_rcv_finish()
linux/net/ipv4/tcp_ipv4.c:tcp_v4_recv()
linux/net/ipv4/tcp_ipv4.c:tcp_v4_do_rcv()
linux/net/ipv4/tcp_input.c:tcp_rcv_established() // packet placed in user queue

Each routine had 2 to 6 conditions that would result in a dropped
packet.  I added printk statements for each of these conditions in
hopes of detecting why the final packet is not acked.  I recompiled
the kernel, and reran the test.  The result was that the packet was
being droped in tcp_rcv_established() due to an invalid checksum.  I
then ran tcpdump to verify that the packets sent from the server were
the same packets that were received by the client.  It turned out that
one byte was being corrupted and it was always the same byte in the
stream that was corrupted.

This was very confusing because my previous logs show _no_ corruption
of the final packet.

Anyway, now it appeared to be a hardware related problem so I started
swapping ethernet cards to no effect.  I then look at the irqs (cat
/proc/interrupts) and noticed that the ethernet card in the client was
sharing an irq with the aic7xxx scsi adapter. The following url made
me think that this could be causing a problem:
http://www.scyld.com/expert/irq-conflict.html

The motherboard on the client is an old Intel PR440FX (dual 200mhz
PPro, onboard LAN, SCSI) and doesn't allow any kind of configuring of
the irqs so I ended up throwing another pci net card in the box just
to juggle the irqs enough so that one of the net cards was not sharing
an irq with the scsi card.  The bug no longer repros!  Neither the
reduced test case nor the original shows any problems.

My only remaining questions are:

1) Does this make sense?  Would a scsi card sharing an irq with a net
   card cause rare but highly reproducable corruption?  I was able to
   run http, telnet, ftp, mail, and games though this card with no
   problems.  It only failed on a specific set of data.  This is what
   initially led me to believe that the problem was not hardware
   related.

2) Now that two net cards are sharing an irq, have I just trading one
   subtle corruption bug for another?  Will some different data set
   cause the same type of corruption?  Is it safe to share irqs?

3) My old tcpdump logs (from several weeks ago) show _no_ corruption.
   I would have believed that I must have screwed up except that I
   still have the logs and the packets sent from the server compare
   exactly with those received by the client.  I can't seem to
   reproduce this behavior.

Robert.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-26 Thread Robert Kleemann

SUMMARY:

The bad network behavior was due to shared irqs somehow screwing
things up.  This explained most but not all of the problems.

DETAILS:

Many people emailed me that they were experiencing similar problems.
Even though the cause of my problem is not kernel related, I'm hoping
my narrative and eventual solution will helps some folks.  I also
still think this behavior is really weird so those of you with an
abundance of brains and curiosity might want to take a guess at
explaining the behavior that I'm seeing.

When I last posted I had a reproducible test case which spewed a bunch
of packets from a server to a client.  The behavior is that the client
eventually stops ACKing and so the the connection stalls indefinitely.
I spent some time studying the kernel networking code and traced the
code path taken by a tcp packet:

linux/net/core/dev.c:netif_rx() // packet received by eth card
linux/net/ipv4/ip_input.c:ip_rcv()
linux/net/ipv4/ip_input.c:ip_rcv_finish()
linux/net/ipv4/tcp_ipv4.c:tcp_v4_recv()
linux/net/ipv4/tcp_ipv4.c:tcp_v4_do_rcv()
linux/net/ipv4/tcp_input.c:tcp_rcv_established() // packet placed in user queue

Each routine had 2 to 6 conditions that would result in a dropped
packet.  I added printk statements for each of these conditions in
hopes of detecting why the final packet is not acked.  I recompiled
the kernel, and reran the test.  The result was that the packet was
being droped in tcp_rcv_established() due to an invalid checksum.  I
then ran tcpdump to verify that the packets sent from the server were
the same packets that were received by the client.  It turned out that
one byte was being corrupted and it was always the same byte in the
stream that was corrupted.

This was very confusing because my previous logs show _no_ corruption
of the final packet.

Anyway, now it appeared to be a hardware related problem so I started
swapping ethernet cards to no effect.  I then look at the irqs (cat
/proc/interrupts) and noticed that the ethernet card in the client was
sharing an irq with the aic7xxx scsi adapter. The following url made
me think that this could be causing a problem:
http://www.scyld.com/expert/irq-conflict.html

The motherboard on the client is an old Intel PR440FX (dual 200mhz
PPro, onboard LAN, SCSI) and doesn't allow any kind of configuring of
the irqs so I ended up throwing another pci net card in the box just
to juggle the irqs enough so that one of the net cards was not sharing
an irq with the scsi card.  The bug no longer repros!  Neither the
reduced test case nor the original shows any problems.

My only remaining questions are:

1) Does this make sense?  Would a scsi card sharing an irq with a net
   card cause rare but highly reproducable corruption?  I was able to
   run http, telnet, ftp, mail, and games though this card with no
   problems.  It only failed on a specific set of data.  This is what
   initially led me to believe that the problem was not hardware
   related.

2) Now that two net cards are sharing an irq, have I just trading one
   subtle corruption bug for another?  Will some different data set
   cause the same type of corruption?  Is it safe to share irqs?

3) My old tcpdump logs (from several weeks ago) show _no_ corruption.
   I would have believed that I must have screwed up except that I
   still have the logs and the packets sent from the server compare
   exactly with those received by the client.  I can't seem to
   reproduce this behavior.

Robert.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



RE: Client receives TCP packets but does not ACK

2001-06-20 Thread David Schwartz


> Btw: can the aplication somehow ask the tcp/ip stack what was
> actualy acked?
> (ie. how many bytes were acked).

No, and you shouldn't want to know. Even if the other end ACKed the data,
that doesn't mean that the application on the other end didn't crash. So it
won't tell you what you want to know, which is 'did the application on the
other end process the data?'.

Application-level guarantees can only be provided by application-level
code.

DS

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



RE: Client receives TCP packets but does not ACK

2001-06-20 Thread David Schwartz


 Btw: can the aplication somehow ask the tcp/ip stack what was
 actualy acked?
 (ie. how many bytes were acked).

No, and you shouldn't want to know. Even if the other end ACKed the data,
that doesn't mean that the application on the other end didn't crash. So it
won't tell you what you want to know, which is 'did the application on the
other end process the data?'.

Application-level guarantees can only be provided by application-level
code.

DS

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-18 Thread dean gaudet

On Tue, 19 Jun 2001, Jonathan Morton wrote:

> >  > >  > Btw: can the aplication somehow ask the tcp/ip stack what was
> >>  >actualy acked?
> >>  >>  (ie. how many bytes were acked).
> >>  >
> >>  >no, but it's not necessarily a useful number anyhow -- because it's
> >>  >possible that the remote end ACKd bytes but the ACK never arrives.  so you
> >>  >can get into a situation where the remote application has the entire
> >>  >message but the local application doesn't know.  the only way to solve
> >>  >this is above the TCP layer.  (message duplicate elimination using an
> >>  >unique id.)
> >>
> >>  No, because if the ACK doesn't reach the sending machine, the sender
> >>  will retry the data until it does get an ACK.
> >
> >if the network goes down in between, the sender may never get the ACK.
> >the sender will see a timeout eventually.  the receiver may already be
> >done with the connection and closed it and never see the error.  if it
> >were a protocol such as SMTP then the sender would retry later, and the
> >result would be a duplicate message.  (which you can eliminate above the
> >TCP layer using unique ids.)
>
> But, if the sender does not attempt to close the socket until the ACK
> returns, then the receiver will see an unfinished connection and
> (hopefully) realise that the message is unsafe and not attempt to
> send it.

suppose the network goes away and doesn't come back.  the ACK never gets
through.

> With SMTP, the last piece of data is a QUIT anyway, which occurs
> after the end-of-message marker - once the QUIT is sent and/or
> received, both ends know that the connection is finished with and
> will close the socket independently of each other.  If the network
> disappears before the QUIT comes along, the receiver should be
> discarding messages and the sender retrying later.

QUIT is the last step of a session which can include multiple messages. a
single message begins with the "MAIL FROM:" and ends with the . that
terminates the DATA section.  after that the smtp server sends back the
"250 OK".  the smtp client is now free to sit on the connection "forever",
possibly beginning another "MAIL FROM:".  (i.e. connection caching in
sendmail.)

but in the meanwhile, the smtp server has moved the message into its next
phase of delivery as soon as it sends back the "250 OK", which could
include having forwarded it off the box, or to a mailing list, etc.

so in this example where you want to consider network failure is after the
smtp client has sent the trailing "." closing the DATA section, and that
data has been received by the smtp server.  then the network fails before
the ACK (and "250 OK") returns to the smtp client.

in this case the client has no choice but to resend later.  (both sides
should get an error eventually assuming the implementors don't suck,
unlike in some other protocols such as HTTP/0.9 and HTTP/1.0 where the
protocol itself is flawed.)  the result is probably a message duplicate.

so what you were asking about was, what if the smtp server in this case
could find out that the "250 OK" was never ACKd.  in that case just move
the network failure a little later in the series of events... and also
consider cases where the TCP stack ACKs but the application never gets to
read() the data (system failures).

basically these transactional semantics have to occur above TCP/IP itself.
this is where QUIT comes in.  and to some extent, message-IDs for
duplicate elimination.  (in HTTP/1.1 the introduction of chunked transfer
encoding to handle variable length dynamic responses in the face of
network failures... but folks building web forms still need to put
unique-ids into the forms to handle the duplicate message problem.)

i guess knowing the number of ACK'd bytes might be a useful debugging aid,
but i'd fear it being misused by app writers.

-dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-18 Thread Jonathan Morton

>  > >  > Btw: can the aplication somehow ask the tcp/ip stack what was
>>  >actualy acked?
>>  >>  (ie. how many bytes were acked).
>>  >
>>  >no, but it's not necessarily a useful number anyhow -- because it's
>>  >possible that the remote end ACKd bytes but the ACK never arrives.  so you
>>  >can get into a situation where the remote application has the entire
>>  >message but the local application doesn't know.  the only way to solve
>>  >this is above the TCP layer.  (message duplicate elimination using an
>>  >unique id.)
>>
>>  No, because if the ACK doesn't reach the sending machine, the sender
>>  will retry the data until it does get an ACK.
>
>if the network goes down in between, the sender may never get the ACK.
>the sender will see a timeout eventually.  the receiver may already be
>done with the connection and closed it and never see the error.  if it
>were a protocol such as SMTP then the sender would retry later, and the
>result would be a duplicate message.  (which you can eliminate above the
>TCP layer using unique ids.)

But, if the sender does not attempt to close the socket until the ACK 
returns, then the receiver will see an unfinished connection and 
(hopefully) realise that the message is unsafe and not attempt to 
send it.

With SMTP, the last piece of data is a QUIT anyway, which occurs 
after the end-of-message marker - once the QUIT is sent and/or 
received, both ends know that the connection is finished with and 
will close the socket independently of each other.  If the network 
disappears before the QUIT comes along, the receiver should be 
discarding messages and the sender retrying later.
-- 
--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
website:  http://www.chromatix.uklinux.net/vnc/
geekcode: GCS$/E dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$
   V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
tagline:  The key to knowledge is not to rely on people to teach you it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-18 Thread dean gaudet



On Mon, 18 Jun 2001, Jonathan Morton wrote:

> >  > Btw: can the aplication somehow ask the tcp/ip stack what was
> >actualy acked?
> >>  (ie. how many bytes were acked).
> >
> >no, but it's not necessarily a useful number anyhow -- because it's
> >possible that the remote end ACKd bytes but the ACK never arrives.  so you
> >can get into a situation where the remote application has the entire
> >message but the local application doesn't know.  the only way to solve
> >this is above the TCP layer.  (message duplicate elimination using an
> >unique id.)
>
> No, because if the ACK doesn't reach the sending machine, the sender
> will retry the data until it does get an ACK.

if the network goes down in between, the sender may never get the ACK.
the sender will see a timeout eventually.  the receiver may already be
done with the connection and closed it and never see the error.  if it
were a protocol such as SMTP then the sender would retry later, and the
result would be a duplicate message.  (which you can eliminate above the
TCP layer using unique ids.)

-dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-18 Thread Jonathan Morton

>  > Btw: can the aplication somehow ask the tcp/ip stack what was 
>actualy acked?
>>  (ie. how many bytes were acked).
>
>no, but it's not necessarily a useful number anyhow -- because it's
>possible that the remote end ACKd bytes but the ACK never arrives.  so you
>can get into a situation where the remote application has the entire
>message but the local application doesn't know.  the only way to solve
>this is above the TCP layer.  (message duplicate elimination using an
>unique id.)

No, because if the ACK doesn't reach the sending machine, the sender 
will retry the data until it does get an ACK.  So the sender always 
has information about some amount of data which is guaranteed to have 
arrived at the other end.  The receiver might know about this sooner, 
but that's simply a function of network latency.

The fundamental problem, if I understand right, is that some stacks 
allow packets indicating closing of a connection (FIN) to arrive 
before the actual data at the end of the connection does.  The only 
workaround I can think of for this is for the closing stack to wait 
until all sent data has been ACKed before sending the FIN.  The ACK 
may, of course, never arrive, but that's what round-trip timeouts are 
for.
-- 
--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
website:  http://www.chromatix.uklinux.net/vnc/
geekcode: GCS$/E dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$
   V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
tagline:  The key to knowledge is not to rely on people to teach you it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-18 Thread dean gaudet

On Mon, 18 Jun 2001, Jan Hudec wrote:

> Btw: can the aplication somehow ask the tcp/ip stack what was actualy acked?
> (ie. how many bytes were acked).

no, but it's not necessarily a useful number anyhow -- because it's
possible that the remote end ACKd bytes but the ACK never arrives.  so you
can get into a situation where the remote application has the entire
message but the local application doesn't know.  the only way to solve
this is above the TCP layer.  (message duplicate elimination using an
unique id.)

if the #bytes ack'd was available it would probably fool people into
implementing buggy code (which of course they do anyhow :)

-dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-18 Thread Jan Hudec

> > TCP is NOT a guaranteed protocol -- you can't just blast data from one
> port
> > to another and expect it to work.
> 
> Isn't it? Are you really sure about that? I thought UDP was the
> not-guaranteed-one and TCP was the one guaranting that all data reaches the
> other end in order and all. Please enlighten me.

It's "hlaf guaranteed." It guarantees, that if data are delivered to the
reciever, all data sent before already arived and in correct order. But it's
not guaranteed that data succesuly writen on 1 end actualy arived unless the
connection was correctly shutdown and closed. 

Btw: can the aplication somehow ask the tcp/ip stack what was actualy acked?
(ie. how many bytes were acked).


- Jan Hudec `Bulb' <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-18 Thread Jan Hudec

  TCP is NOT a guaranteed protocol -- you can't just blast data from one
 port
  to another and expect it to work.
 
 Isn't it? Are you really sure about that? I thought UDP was the
 not-guaranteed-one and TCP was the one guaranting that all data reaches the
 other end in order and all. Please enlighten me.

It's hlaf guaranteed. It guarantees, that if data are delivered to the
reciever, all data sent before already arived and in correct order. But it's
not guaranteed that data succesuly writen on 1 end actualy arived unless the
connection was correctly shutdown and closed. 

Btw: can the aplication somehow ask the tcp/ip stack what was actualy acked?
(ie. how many bytes were acked).


- Jan Hudec `Bulb' [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-18 Thread dean gaudet

On Mon, 18 Jun 2001, Jan Hudec wrote:

 Btw: can the aplication somehow ask the tcp/ip stack what was actualy acked?
 (ie. how many bytes were acked).

no, but it's not necessarily a useful number anyhow -- because it's
possible that the remote end ACKd bytes but the ACK never arrives.  so you
can get into a situation where the remote application has the entire
message but the local application doesn't know.  the only way to solve
this is above the TCP layer.  (message duplicate elimination using an
unique id.)

if the #bytes ack'd was available it would probably fool people into
implementing buggy code (which of course they do anyhow :)

-dean

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-18 Thread Jonathan Morton

   Btw: can the aplication somehow ask the tcp/ip stack what was 
actualy acked?
  (ie. how many bytes were acked).

no, but it's not necessarily a useful number anyhow -- because it's
possible that the remote end ACKd bytes but the ACK never arrives.  so you
can get into a situation where the remote application has the entire
message but the local application doesn't know.  the only way to solve
this is above the TCP layer.  (message duplicate elimination using an
unique id.)

No, because if the ACK doesn't reach the sending machine, the sender 
will retry the data until it does get an ACK.  So the sender always 
has information about some amount of data which is guaranteed to have 
arrived at the other end.  The receiver might know about this sooner, 
but that's simply a function of network latency.

The fundamental problem, if I understand right, is that some stacks 
allow packets indicating closing of a connection (FIN) to arrive 
before the actual data at the end of the connection does.  The only 
workaround I can think of for this is for the closing stack to wait 
until all sent data has been ACKed before sending the FIN.  The ACK 
may, of course, never arrive, but that's what round-trip timeouts are 
for.
-- 
--
from: Jonathan Chromatix Morton
mail: [EMAIL PROTECTED]  (not for attachments)
website:  http://www.chromatix.uklinux.net/vnc/
geekcode: GCS$/E dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$
   V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
tagline:  The key to knowledge is not to rely on people to teach you it.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-18 Thread dean gaudet



On Mon, 18 Jun 2001, Jonathan Morton wrote:

Btw: can the aplication somehow ask the tcp/ip stack what was
 actualy acked?
   (ie. how many bytes were acked).
 
 no, but it's not necessarily a useful number anyhow -- because it's
 possible that the remote end ACKd bytes but the ACK never arrives.  so you
 can get into a situation where the remote application has the entire
 message but the local application doesn't know.  the only way to solve
 this is above the TCP layer.  (message duplicate elimination using an
 unique id.)

 No, because if the ACK doesn't reach the sending machine, the sender
 will retry the data until it does get an ACK.

if the network goes down in between, the sender may never get the ACK.
the sender will see a timeout eventually.  the receiver may already be
done with the connection and closed it and never see the error.  if it
were a protocol such as SMTP then the sender would retry later, and the
result would be a duplicate message.  (which you can eliminate above the
TCP layer using unique ids.)

-dean

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-18 Thread Jonathan Morton

  Btw: can the aplication somehow ask the tcp/ip stack what was
  actualy acked?
(ie. how many bytes were acked).
  
  no, but it's not necessarily a useful number anyhow -- because it's
  possible that the remote end ACKd bytes but the ACK never arrives.  so you
  can get into a situation where the remote application has the entire
  message but the local application doesn't know.  the only way to solve
  this is above the TCP layer.  (message duplicate elimination using an
  unique id.)

  No, because if the ACK doesn't reach the sending machine, the sender
  will retry the data until it does get an ACK.

if the network goes down in between, the sender may never get the ACK.
the sender will see a timeout eventually.  the receiver may already be
done with the connection and closed it and never see the error.  if it
were a protocol such as SMTP then the sender would retry later, and the
result would be a duplicate message.  (which you can eliminate above the
TCP layer using unique ids.)

But, if the sender does not attempt to close the socket until the ACK 
returns, then the receiver will see an unfinished connection and 
(hopefully) realise that the message is unsafe and not attempt to 
send it.

With SMTP, the last piece of data is a QUIT anyway, which occurs 
after the end-of-message marker - once the QUIT is sent and/or 
received, both ends know that the connection is finished with and 
will close the socket independently of each other.  If the network 
disappears before the QUIT comes along, the receiver should be 
discarding messages and the sender retrying later.
-- 
--
from: Jonathan Chromatix Morton
mail: [EMAIL PROTECTED]  (not for attachments)
website:  http://www.chromatix.uklinux.net/vnc/
geekcode: GCS$/E dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$
   V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
tagline:  The key to knowledge is not to rely on people to teach you it.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-18 Thread dean gaudet

On Tue, 19 Jun 2001, Jonathan Morton wrote:

   Btw: can the aplication somehow ask the tcp/ip stack what was
   actualy acked?
 (ie. how many bytes were acked).
   
   no, but it's not necessarily a useful number anyhow -- because it's
   possible that the remote end ACKd bytes but the ACK never arrives.  so you
   can get into a situation where the remote application has the entire
   message but the local application doesn't know.  the only way to solve
   this is above the TCP layer.  (message duplicate elimination using an
   unique id.)
 
   No, because if the ACK doesn't reach the sending machine, the sender
   will retry the data until it does get an ACK.
 
 if the network goes down in between, the sender may never get the ACK.
 the sender will see a timeout eventually.  the receiver may already be
 done with the connection and closed it and never see the error.  if it
 were a protocol such as SMTP then the sender would retry later, and the
 result would be a duplicate message.  (which you can eliminate above the
 TCP layer using unique ids.)

 But, if the sender does not attempt to close the socket until the ACK
 returns, then the receiver will see an unfinished connection and
 (hopefully) realise that the message is unsafe and not attempt to
 send it.

suppose the network goes away and doesn't come back.  the ACK never gets
through.

 With SMTP, the last piece of data is a QUIT anyway, which occurs
 after the end-of-message marker - once the QUIT is sent and/or
 received, both ends know that the connection is finished with and
 will close the socket independently of each other.  If the network
 disappears before the QUIT comes along, the receiver should be
 discarding messages and the sender retrying later.

QUIT is the last step of a session which can include multiple messages. a
single message begins with the MAIL FROM:foo and ends with the . that
terminates the DATA section.  after that the smtp server sends back the
250 OK.  the smtp client is now free to sit on the connection forever,
possibly beginning another MAIL FROM:foo.  (i.e. connection caching in
sendmail.)

but in the meanwhile, the smtp server has moved the message into its next
phase of delivery as soon as it sends back the 250 OK, which could
include having forwarded it off the box, or to a mailing list, etc.

so in this example where you want to consider network failure is after the
smtp client has sent the trailing . closing the DATA section, and that
data has been received by the smtp server.  then the network fails before
the ACK (and 250 OK) returns to the smtp client.

in this case the client has no choice but to resend later.  (both sides
should get an error eventually assuming the implementors don't suck,
unlike in some other protocols such as HTTP/0.9 and HTTP/1.0 where the
protocol itself is flawed.)  the result is probably a message duplicate.

so what you were asking about was, what if the smtp server in this case
could find out that the 250 OK was never ACKd.  in that case just move
the network failure a little later in the series of events... and also
consider cases where the TCP stack ACKs but the application never gets to
read() the data (system failures).

basically these transactional semantics have to occur above TCP/IP itself.
this is where QUIT comes in.  and to some extent, message-IDs for
duplicate elimination.  (in HTTP/1.1 the introduction of chunked transfer
encoding to handle variable length dynamic responses in the face of
network failures... but folks building web forms still need to put
unique-ids into the forms to handle the duplicate message problem.)

i guess knowing the number of ACK'd bytes might be a useful debugging aid,
but i'd fear it being misused by app writers.

-dean

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-17 Thread dean gaudet

On Sun, 17 Jun 2001, Dan Podeanu wrote:

> Is there any logical reason why if, given fd is a connected, AF_INET,
> SOCK_STREAM socket, and one does a write(fd, buffer, len); close(fd);
> to the peer, over a rather slow network (read modem, satelite link, etc),
> the data gets lost (the remote receives the disconnect before the last
> packet). According to socket(7), even if SO_LINGER is not set, the data
> is flushed in the background.

suppose A writes B, and A closes its fd.  suppose that B writes to A and
it arrives at A before the A->B data has left the buffer.  in that case A
will RST and drop the data in its buffer, so you've lost the data you
thought you had transmitted.

> Is it Linux or TCP specific? Or some obvious techincal detail I'm missing?

it's TCP.

the linux specific behaviour (and the recommended behaviour now) is that
above if the B->A traffic arrived before the close, but wasn't read by the
application on A, then the RST will be sent immediately.  this generally
results in folks discovering their broken applications much earlier than
on other stacks.  it's basically a race condition as to when the B->A data
arrives.

the above is the reason apache uses at least 4 system calls to tear down a
connection... with http/1.1 and pipelining it's totally valid for the B->A
traffic to be sent regardless of what's happening in the A->B direction.
(ditto for multiplexed protocols... and to some extent SMTP pipelining.)

-dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-17 Thread Dan Podeanu

On Sun, Jun 17, 2001 at 05:13:43PM -0400, Albert D. Cahalan wrote:
> > Is there any logical reason why if, given fd is a connected, AF_INET,
> > SOCK_STREAM socket, and one does a write(fd, buffer, len); close(fd);
> > to the peer, over a rather slow network (read modem, satelite link, etc),
> > the data gets lost (the remote receives the disconnect before the last
> > packet). According to socket(7), even if SO_LINGER is not set, the data
> > is flushed in the background.
> > 
> > Is it Linux or TCP specific? Or some obvious techincal detail I'm missing?
> 
> The UNIX API (Linux, BSD, Solaris, OSF/1...) requires that you
> put that write() call in a loop, because you can get partial
> writes. Repeat until done... the OS might do 1 byte at a time.

Not so true. The write is completed successfuly, ie.
size == write(fd, buf, size); so the data actually gets to the kernel's
network buffer, only the background polling is not done properly, in the
way I see things.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-17 Thread Andi Kleen

Alan Cox <[EMAIL PROTECTED]> writes:

> > > Specifically
> > > 1.If the receiver closes and there is unread data many TCP's forget
> > >   to RST the sender to indicate that data was lost.
> > 
> > Do at least FreeBSD, Solaris and NT sent RST correctly?
> 
> I dont believe so

There is also a different bug in Linux that makes the application not notice
errors. When it does a close() and an error occurs while flushing buffered
data and doing the FIN handshake it is not returned by close() (no matter
if linger time hits or not). Most transaction applications like SMTP fortunately 
use an own ACKing protocol, which works around that.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-17 Thread Dan Podeanu

On Sun, Jun 17, 2001 at 08:17:27PM +0200, Pavel Machek wrote:

> > 2.  There is a flaw in the TCP protocol itself that is extremely unlikely
> > to bite people but can in theory cause wrong data in some unusual
> > circumstances that Ian Heavans found and has yet to be fixed by
> > the keepers of the protocol.

Bit offtopic.

Is there any logical reason why if, given fd is a connected, AF_INET,
SOCK_STREAM socket, and one does a write(fd, buffer, len); close(fd);
to the peer, over a rather slow network (read modem, satelite link, etc),
the data gets lost (the remote receives the disconnect before the last
packet). According to socket(7), even if SO_LINGER is not set, the data
is flushed in the background.

Is it Linux or TCP specific? Or some obvious techincal detail I'm missing?

Thanks, Dan.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-17 Thread Alan Cox

> > Specifically
> > 1.  If the receiver closes and there is unread data many TCP's forget
> > to RST the sender to indicate that data was lost.
> 
> Do at least FreeBSD, Solaris and NT sent RST correctly?

I dont believe so

> > 2.  There is a flaw in the TCP protocol itself that is extremely unlikely
> > to bite people but can in theory cause wrong data in some unusual
> > circumstances that Ian Heavans found and has yet to be fixed by
> > the keepers of the protocol.
> 
> This is interesting; where are details?

http://www.schooner.com/~loverso/Public/Internet-Drafts/draft-heavens-problems-rsts-00.txt

Yes a 1996 tcp protocol flaw that still hasnt been fixed. 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-17 Thread Dan Podeanu

On Sun, Jun 17, 2001 at 08:17:27PM +0200, Pavel Machek wrote:

  2.  There is a flaw in the TCP protocol itself that is extremely unlikely
  to bite people but can in theory cause wrong data in some unusual
  circumstances that Ian Heavans found and has yet to be fixed by
  the keepers of the protocol.

Bit offtopic.

Is there any logical reason why if, given fd is a connected, AF_INET,
SOCK_STREAM socket, and one does a write(fd, buffer, len); close(fd);
to the peer, over a rather slow network (read modem, satelite link, etc),
the data gets lost (the remote receives the disconnect before the last
packet). According to socket(7), even if SO_LINGER is not set, the data
is flushed in the background.

Is it Linux or TCP specific? Or some obvious techincal detail I'm missing?

Thanks, Dan.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-17 Thread Dan Podeanu

On Sun, Jun 17, 2001 at 05:13:43PM -0400, Albert D. Cahalan wrote:
  Is there any logical reason why if, given fd is a connected, AF_INET,
  SOCK_STREAM socket, and one does a write(fd, buffer, len); close(fd);
  to the peer, over a rather slow network (read modem, satelite link, etc),
  the data gets lost (the remote receives the disconnect before the last
  packet). According to socket(7), even if SO_LINGER is not set, the data
  is flushed in the background.
  
  Is it Linux or TCP specific? Or some obvious techincal detail I'm missing?
 
 The UNIX API (Linux, BSD, Solaris, OSF/1...) requires that you
 put that write() call in a loop, because you can get partial
 writes. Repeat until done... the OS might do 1 byte at a time.

Not so true. The write is completed successfuly, ie.
size == write(fd, buf, size); so the data actually gets to the kernel's
network buffer, only the background polling is not done properly, in the
way I see things.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-17 Thread dean gaudet

On Sun, 17 Jun 2001, Dan Podeanu wrote:

 Is there any logical reason why if, given fd is a connected, AF_INET,
 SOCK_STREAM socket, and one does a write(fd, buffer, len); close(fd);
 to the peer, over a rather slow network (read modem, satelite link, etc),
 the data gets lost (the remote receives the disconnect before the last
 packet). According to socket(7), even if SO_LINGER is not set, the data
 is flushed in the background.

suppose A writes B, and A closes its fd.  suppose that B writes to A and
it arrives at A before the A-B data has left the buffer.  in that case A
will RST and drop the data in its buffer, so you've lost the data you
thought you had transmitted.

 Is it Linux or TCP specific? Or some obvious techincal detail I'm missing?

it's TCP.

the linux specific behaviour (and the recommended behaviour now) is that
above if the B-A traffic arrived before the close, but wasn't read by the
application on A, then the RST will be sent immediately.  this generally
results in folks discovering their broken applications much earlier than
on other stacks.  it's basically a race condition as to when the B-A data
arrives.

the above is the reason apache uses at least 4 system calls to tear down a
connection... with http/1.1 and pipelining it's totally valid for the B-A
traffic to be sent regardless of what's happening in the A-B direction.
(ditto for multiplexed protocols... and to some extent SMTP pipelining.)

-dean

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-17 Thread Alan Cox

  Specifically
  1.  If the receiver closes and there is unread data many TCP's forget
  to RST the sender to indicate that data was lost.
 
 Do at least FreeBSD, Solaris and NT sent RST correctly?

I dont believe so

  2.  There is a flaw in the TCP protocol itself that is extremely unlikely
  to bite people but can in theory cause wrong data in some unusual
  circumstances that Ian Heavans found and has yet to be fixed by
  the keepers of the protocol.
 
 This is interesting; where are details?

http://www.schooner.com/~loverso/Public/Internet-Drafts/draft-heavens-problems-rsts-00.txt

Yes a 1996 tcp protocol flaw that still hasnt been fixed. 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-17 Thread Andi Kleen

Alan Cox [EMAIL PROTECTED] writes:

   Specifically
   1.If the receiver closes and there is unread data many TCP's forget
 to RST the sender to indicate that data was lost.
  
  Do at least FreeBSD, Solaris and NT sent RST correctly?
 
 I dont believe so

There is also a different bug in Linux that makes the application not notice
errors. When it does a close() and an error occurs while flushing buffered
data and doing the FIN handshake it is not returned by close() (no matter
if linger time hits or not). Most transaction applications like SMTP fortunately 
use an own ACKing protocol, which works around that.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-16 Thread Robert Kleemann

In order to figure out what this problem is I'm going to add some
printk statements in the networking code on the client machine.
Hopefully, this will show me what's going on.  My goal is to trace the
receipt of the datagram by tcp, see why/how it's deciding to ack or
not ack, and then trace the sending of the ack.

There are quite a few files that seem to be involved including:
linux/net/ipv4/tcp*.c as well as some important structures in
linux/include/net/sock.h

I'm guessing this is going to take me a while just to figure out where
to look and what to look for.  Can any of you networking gurus save me
some time and suggest some functions to start looking at?

thanx!
Robert

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-16 Thread Mike Black

OK guys -- how much money are you willing to be that TCP is guaranteed??
Since you don't want to talk OSI that's OK -- that's just to educate some
people.

Try this: (this is what I ran into years ago and had to argue to death).

#1 Client1 has tcp connection to Server1.  Both machines are setup to retry
connections if they fail.
#2 Server1 has power outage (note that Client1 has absolutely NO idea what
happens until Server1 is back up again no RST -- no nothin').
#3 Client1 finally times out and is able to reconnect to Server1 and thinks
everything is OK (as do all the programmers at our customer who think TCP is
a guaranteed protocol).
#4 Analysis shows numerous transacations have been lost (complete panic by
the customer).

Here's the big question.  Who's fault is it?  Our customer tried to claim
that the TCP stack was at fault on our server (a Windows 3.1 box) because it
"dropped packets" and didn't know about it.  Then they thought that the TCP
stack on their client was at fault because it never showed an error trying
to write to the socket.

After much argument I finally was able to show them (from the author of TCP
whom I emailed for support) that TCP is NOT guaranteed -- it's up to what
you guys are calling the "API" layer (OSI Layer 7) to ENSURE that data
ACTUALLY gets to it's intended target.  I was brought in late on this
contract but I never would've implemented the brain-dead protocol (or
actually complete lack of one) for sending transactions across a socket.

You're right in that TCP will work just fine AS LONG AS THERE ARE NO
PROBLEMS

You can write a program that just opens a socket and blasts data to the
recipient without an error.  And as long as your protocol is session
oriented you'll be fine.  If the session aborts you just resend the whole
thing.

But that does NOT make a robust solution for a transaction oriented protocol
(like the one that started this thread) (contrary to what many people think
AND code up).
P.S. My reference to TCP being at OSI layer 5 is because that's what the API
is for sockets -- Session Layer -- and that's all that people generally
think is needed.  Big mistake if you're transaction-oriented.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-16 Thread Mike Black

OK guys -- how much money are you willing to be that TCP is guaranteed??
Since you don't want to talk OSI that's OK -- that's just to educate some
people.

Try this: (this is what I ran into years ago and had to argue to death).

#1 Client1 has tcp connection to Server1.  Both machines are setup to retry
connections if they fail.
#2 Server1 has power outage (note that Client1 has absolutely NO idea what
happens until Server1 is back up again no RST -- no nothin').
#3 Client1 finally times out and is able to reconnect to Server1 and thinks
everything is OK (as do all the programmers at our customer who think TCP is
a guaranteed protocol).
#4 Analysis shows numerous transacations have been lost (complete panic by
the customer).

Here's the big question.  Who's fault is it?  Our customer tried to claim
that the TCP stack was at fault on our server (a Windows 3.1 box) because it
dropped packets and didn't know about it.  Then they thought that the TCP
stack on their client was at fault because it never showed an error trying
to write to the socket.

After much argument I finally was able to show them (from the author of TCP
whom I emailed for support) that TCP is NOT guaranteed -- it's up to what
you guys are calling the API layer (OSI Layer 7) to ENSURE that data
ACTUALLY gets to it's intended target.  I was brought in late on this
contract but I never would've implemented the brain-dead protocol (or
actually complete lack of one) for sending transactions across a socket.

You're right in that TCP will work just fine AS LONG AS THERE ARE NO
PROBLEMS

You can write a program that just opens a socket and blasts data to the
recipient without an error.  And as long as your protocol is session
oriented you'll be fine.  If the session aborts you just resend the whole
thing.

But that does NOT make a robust solution for a transaction oriented protocol
(like the one that started this thread) (contrary to what many people think
AND code up).
P.S. My reference to TCP being at OSI layer 5 is because that's what the API
is for sockets -- Session Layer -- and that's all that people generally
think is needed.  Big mistake if you're transaction-oriented.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-16 Thread Robert Kleemann

In order to figure out what this problem is I'm going to add some
printk statements in the networking code on the client machine.
Hopefully, this will show me what's going on.  My goal is to trace the
receipt of the datagram by tcp, see why/how it's deciding to ack or
not ack, and then trace the sending of the ack.

There are quite a few files that seem to be involved including:
linux/net/ipv4/tcp*.c as well as some important structures in
linux/include/net/sock.h

I'm guessing this is going to take me a while just to figure out where
to look and what to look for.  Can any of you networking gurus save me
some time and suggest some functions to start looking at?

thanx!
Robert

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-15 Thread Robert Kleemann

First a little more data on the problem.

1) I've never seen it with the client and server program on the same
   computer.

2) It only repros on some systems.  If I can repro it on some systems
   and then make the client the server and the server the client the
   bug will often fail to repro.

3) Once it repros it _consistently_ repros seemingly independent of
   kernel versions.

Second, I really don't think it is a problem with the server.  Running
the test a few times I get the following behavior. The client reads up
to message #19 before failing to ack, the server program keeps
executing "prints" until the send buffer fills and then blocks on the
next print at message #25.  "netstat -t" on the server shows the
following

tcp0   8688 manny-out:20001 glottis:33193   ESTABLISHED

If I run the test a few more times, the client will block on a
different message (6,5,10,etc) and the server program will continue to
fill the buffer with the same amount of data before blocking.

What is more interesting to me is that if I add print statements
before and after the client "recv" call then it appears that the
client is blocking within the receive call.  If I packet sniff the
client then I see lots of packets from the server with happy acks from
the client until one packet does not receive an ack.  The server then
does the right thing and resends the non-acked packet at increasing
time intervals.  Eventually the client sends an ack for the previously
received packet (not the most recent that is being resent.  This
exchange continues indefinitely.  I've appended a commented packet log
to the end of this email.

So the client app is in the recv call waiting for data and the
interface is receiving the packet that should satisfy that block.  Why
would the client not be sending an ack?  If the buffer is full then
shouldn't the thread be returning from the recv call and releasing
some of that data?

One person emailed me a possible gotcha is that SIGINT is being
triggered somehow by the system call but I added a trap to the
original java app and now to the little perl script and it seems to
never be triggered.

Any other ideas for areas to investigate?

Robert.

Here are the logs.

packet A sent
20:15:21.703718 < manny.20001 > glottis-in.33088: P 14729:14984(255) ack 1 win 5792 
 (DF)

packet B sent
20:15:21.713719 < manny.20001 > glottis-in.33088: . 14984:16432(1448) ack 1 win 5792 
 (DF)

ack up to packet B
20:15:21.713719 > glottis-in.33088 > manny.20001: . 1:1(0) ack 16432 win 34752 
 (DF)

packet C sent
20:15:21.713719 < manny.20001 > glottis-in.33088: P 16432:16714(282) ack 1 win 5792 
 (DF)

packet D sent
20:15:21.713719 < manny.20001 > glottis-in.33088: . 16714:18162(1448) ack 1 win 5792 
 (DF)

ack up to packet D
20:15:21.713719 > glottis-in.33088 > manny.20001: . 1:1(0) ack 18162 win 37648 
 (DF)

packet E sent
20:15:21.723719 < manny.20001 > glottis-in.33088: P 18162:18420(258) ack 1 win 5792 
 (DF)

packet F sent
20:15:21.733719 < manny.20001 > glottis-in.33088: . 18420:19868(1448) ack 1 win 5792 
 (DF)

packet G sent
20:15:21.743719 < manny.20001 > glottis-in.33088: . 19868:21316(1448) ack 1 win 5792 
 (DF)

ack up to packet E
20:15:21.743719 > glottis-in.33088 > manny.20001: . 1:1(0) ack 18420 win 37648 
 (DF)

packet H sent
20:15:21.743719 < manny.20001 > glottis-in.33088: P 21316:22139(823) ack 1 win 5792 
 (DF)

ack up to packet E
20:15:21.743719 > glottis-in.33088 > manny.20001: . 1:1(0) ack 18420 win 37648 
 (DF)

packet I sent
20:15:21.763720 < manny.20001 > glottis-in.33088: . 22139:23587(1448) ack 1 win 5792 
 (DF)

ack up to packet E
20:15:21.763720 > glottis-in.33088 > manny.20001: . 1:1(0) ack 18420 win 37648 
 (DF)

resend packet F many times in increasing time intervals. Why does the
client not ack this?
20:15:21.763720 < manny.20001 > glottis-in.33088: . 18420:19868(1448) ack 1 win 5792 
 (DF)
20:15:21.973726 < manny.20001 > glottis-in.33088: . 18420:19868(1448) ack 1 win 5792 
 (DF)
20:15:22.413738 < manny.20001 > glottis-in.33088: . 18420:19868(1448) ack 1 win 5792 
 (DF)
20:15:23.293761 < manny.20001 > glottis-in.33088: . 18420:19868(1448) ack 1 win 5792 
 (DF)
20:15:25.053809 < manny.20001 > glottis-in.33088: . 18420:19868(1448) ack 1 win 5792 
 (DF)
20:15:28.573905 < manny.20001 > glottis-in.33088: . 18420:19868(1448) ack 1 win 5792 
 (DF)
20:15:35.614095 < manny.20001 > glottis-in.33088: . 18420:19868(1448) ack 1 win 5792 
 (DF)

ack up to packet E
20:15:41.964268 > glottis-in.33088 > manny.20001: F 1:1(0) ack 18420 win 37648 
 (DF)

Not sure what this is for...
20:15:41.964268 < manny.20001 > glottis-in.33088: . 23587:23587(0) ack 2 win 5792 
 (DF)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-15 Thread Gérard Roudier



On Fri, 15 Jun 2001, Mike Black wrote:

> This is a very common misconception -- I worked a contract many years ago
> where I actually had to quote the author of TCP to convince a banking
> company I was working with that TCP is not a guaranteed protocol.
> Guaranteed delivery at layer 5 - yes -- but NOT a guaranteed protcol.
> 
> Guaranteed means that there is absolutely NO way that data can be dropped by
> an application if either sender or receiver screws up.
> 
> The only way to do this is at layer 7 of the OSI model -- even then you end
> up making assumptions.

You are mixing oranges (protocols) and apples (implementations and APIs)
here.

The layer that is expected to provide reliable end to end communication is
layer 4 (transport layer). TCP, at least in theory, is as good as OSI
transport in providing reliable end to end communication. 

> Here's some examples for layer 5 (which TCP operates at) but talking at
> Layer 7:
> 
> #1 - You send() data -- meanwhile the receiver terminates the connection --
> what happened to the data?  It's gone!  Your app never receives feedback
> that it didn't send() correctly.  You'll see the reset on the next read but
> you don't know what happened to the data.
> #2 - You send() data and overrun your IP queue -- nobody will ever know the
> difference without a layer 7 protocol (or int the case quoted in this
> subject it might lock up).
> #3 - You send() data and either machine has bad RAM and flips a bit -- guess
> what? -- data corruption.
> 
> Even when you do layer 7 (with checksums and ack/nak) you make assumptions:
> 
> #1 - You checksum the packet you just received -- what's to say a bit can't
> flip?
> 
> TCP may be guaranteed at layer 5 but we don't typically program at layer
> 5 -- we program at layer 7 and then lots of people assume they're doing it
> at layer 5 -- ergo the problems.

Layers above layer 4 provide additionnal services for applications but
they assume that layer 4 is reliable. In other words, a broken transport
layer breaks all layers above it and thus the applications.

In fact, when you build your application above layer 4 and need services
normally provided by upper OSI layers, you have to implement equivalent
services in your application, using layered protocols or not.

> To look at it another way -- "Just 'cuz I told my C library to send a packet
> doesn't mean it's going to work".
> For example, if you're using non-blocking sockets you have to check to
> ensure there's room in your IP queue to transmit.

That's API semantic issue, not protocol issue.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-15 Thread Alan Cox

> TCP is guaranteed delivery at layer 5 -- but that's all -- not a "guaranteed
> protocol"

For certain specific cases this is in itself not true either. Also for many
many implementations.

Specifically
1.  If the receiver closes and there is unread data many TCP's forget
to RST the sender to indicate that data was lost.

2.  There is a flaw in the TCP protocol itself that is extremely unlikely
to bite people but can in theory cause wrong data in some unusual
circumstances that Ian Heavans found and has yet to be fixed by
the keepers of the protocol.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-15 Thread Albert D. Cahalan

Mike Black writes:

> I'm concerned that you're probably just overruning your IP stack:
...
> TCP is NOT a guaranteed protocol -- you can't just blast data from one port
> to another and expect it to work.

Yes you can. This is why we have TCP in fact.

> a tcp-write is NOT guaranteed -- and as you've seen -- a recv() isn't either
> (that's why you need timeouts).
> You're probably overrunning the tcp buffer on your "print" statement and
> truncating a block.
> I don't see where you're checking forEAGAIN or EWOULDBLOCK (see man
> send).

You do have to check for partial writes due to the UNIX API.
Then check for EAGAIN and EINTR at least.

> You need a layer-7 protocol that will guarantee your transactions -- once
> you're client acks/naks your server I'll bet everything works hunky-dory.
> If you're not familiar with the OSI model
> http://www.csihq.com/~mike/students/networking/iso/isomodel.html

You don't need that crap. TCP/IP doesn't even fit the OSI model,
and we're missing much of the OSI stack AFAIK. (Do we have that
thing with 10-byte addresses? I think not.)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-15 Thread Mike Black

This is a very common misconception -- I worked a contract many years ago
where I actually had to quote the author of TCP to convince a banking
company I was working with that TCP is not a guaranteed protocol.
Guaranteed delivery at layer 5 - yes -- but NOT a guaranteed protcol.

Guaranteed means that there is absolutely NO way that data can be dropped by
an application if either sender or receiver screws up.

The only way to do this is at layer 7 of the OSI model -- even then you end
up making assumptions.

Here's some examples for layer 5 (which TCP operates at) but talking at
Layer 7:

#1 - You send() data -- meanwhile the receiver terminates the connection --
what happened to the data?  It's gone!  Your app never receives feedback
that it didn't send() correctly.  You'll see the reset on the next read but
you don't know what happened to the data.
#2 - You send() data and overrun your IP queue -- nobody will ever know the
difference without a layer 7 protocol (or int the case quoted in this
subject it might lock up).
#3 - You send() data and either machine has bad RAM and flips a bit -- guess
what? -- data corruption.

Even when you do layer 7 (with checksums and ack/nak) you make assumptions:

#1 - You checksum the packet you just received -- what's to say a bit can't
flip?

TCP may be guaranteed at layer 5 but we don't typically program at layer
5 -- we program at layer 7 and then lots of people assume they're doing it
at layer 5 -- ergo the problems.

To look at it another way -- "Just 'cuz I told my C library to send a packet
doesn't mean it's going to work".
For example, if you're using non-blocking sockets you have to check to
ensure there's room in your IP queue to transmit.

TCP is guaranteed delivery at layer 5 -- but that's all -- not a "guaranteed
protocol"

Michael D. Black   Principal Engineer
[EMAIL PROTECTED]  321-676-2923,x203
http://www.csihq.com  Computer Science Innovations
http://www.csihq.com/~mike  My home page
FAX 321-676-2355
- Original Message -
From: "Heusden, Folkert van" <[EMAIL PROTECTED]>
To: "Mike Black" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Friday, June 15, 2001 8:53 AM
Subject: RE: Client receives TCP packets but does not ACK


> TCP is NOT a guaranteed protocol -- you can't just blast data from one
port
> to another and expect it to work.

Isn't it? Are you really sure about that? I thought UDP was the
not-guaranteed-one and TCP was the one guaranting that all data reaches the
other end in order and all. Please enlighten me.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



RE: Client receives TCP packets but does not ACK

2001-06-15 Thread Heusden, Folkert van

> TCP is NOT a guaranteed protocol -- you can't just blast data from one
port
> to another and expect it to work.

Isn't it? Are you really sure about that? I thought UDP was the
not-guaranteed-one and TCP was the one guaranting that all data reaches the
other end in order and all. Please enlighten me.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-15 Thread Mike Black

Here's the end of my run -- I assume this means my config works OK?
I'm on a dual PIII/600 linux-2.4.6-pre3 -- ran it all on the local host.

received msg#90, name pad1, 1 blocks, 12 total bytes
received msg#91, name pad1, 1 blocks, 12 total bytes
received msg#92, name class tande.server.ClientMap, 1624 blocks, 3244 total
bytes
received msg#93, name pad1, 1 blocks, 12 total bytes
received msg#94, name pad1, 1 blocks, 12 total bytes
received msg#95, name pad1, 1 blocks, 12 total bytes
received msg#96, name class tande.server.ClientMap, 1624 blocks, 3244 total
bytes
successfully read all blocks

I'm concerned that you're probably just overruning your IP stack:
  foreach $block (@blocks) {
print $client $block;
$bytes += length($block);
  }

TCP is NOT a guaranteed protocol -- you can't just blast data from one port
to another and expect it to work.
a tcp-write is NOT guaranteed -- and as you've seen -- a recv() isn't either
(that's why you need timeouts).
You're probably overrunning the tcp buffer on your "print" statement and
truncating a block.
I don't see where you're checking forEAGAIN or EWOULDBLOCK (see man
send).
Not real sure how to do this in perl...


You need a layer-7 protocol that will guarantee your transactions -- once
you're client acks/naks your server I'll bet everything works hunky-dory.
If you're not familiar with the OSI model
http://www.csihq.com/~mike/students/networking/iso/isomodel.html

Michael D. Black   Principal Engineer
[EMAIL PROTECTED]  321-676-2923,x203
http://www.csihq.com  Computer Science Innovations
http://www.csihq.com/~mike  My home page
FAX 321-676-2355
- Original Message -
From: "Robert Kleemann" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, June 14, 2001 11:50 PM
Subject: Re: Client receives TCP packets but does not ACK


A couple people have requested a test case.

The problem first showed up in a very large java app.  Since then I
wrote a small perl program to duplicate the behavior of the large app
by sending the same data, in the same order, in the same sized blocks,
from the server to the client.

If you want to test this on your configuration then download the
following two files:
http://www.kleemann.org/crap/clientserver
http://www.kleemann.org/crap/log1e1.txt

Place a copy of the files in the same directory on both the client and
the server and run the program the following way:

[server]$ ./clientserver -s log1e1.txt
listening on port 20001

[client]$ ./clientserver -c serverhostname log1e1.txt

The server will attempt to send the data to the client which then
verifies each byte received.

My client generally stops ack-ing between messages 15 and 25.

Robert.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-15 Thread Mike Black

Here's the end of my run -- I assume this means my config works OK?
I'm on a dual PIII/600 linux-2.4.6-pre3 -- ran it all on the local host.

received msg#90, name pad1, 1 blocks, 12 total bytes
received msg#91, name pad1, 1 blocks, 12 total bytes
received msg#92, name class tande.server.ClientMap, 1624 blocks, 3244 total
bytes
received msg#93, name pad1, 1 blocks, 12 total bytes
received msg#94, name pad1, 1 blocks, 12 total bytes
received msg#95, name pad1, 1 blocks, 12 total bytes
received msg#96, name class tande.server.ClientMap, 1624 blocks, 3244 total
bytes
successfully read all blocks

I'm concerned that you're probably just overruning your IP stack:
  foreach $block (@blocks) {
print $client $block;
$bytes += length($block);
  }

TCP is NOT a guaranteed protocol -- you can't just blast data from one port
to another and expect it to work.
a tcp-write is NOT guaranteed -- and as you've seen -- a recv() isn't either
(that's why you need timeouts).
You're probably overrunning the tcp buffer on your print statement and
truncating a block.
I don't see where you're checking forEAGAIN or EWOULDBLOCK (see man
send).
Not real sure how to do this in perl...


You need a layer-7 protocol that will guarantee your transactions -- once
you're client acks/naks your server I'll bet everything works hunky-dory.
If you're not familiar with the OSI model
http://www.csihq.com/~mike/students/networking/iso/isomodel.html

Michael D. Black   Principal Engineer
[EMAIL PROTECTED]  321-676-2923,x203
http://www.csihq.com  Computer Science Innovations
http://www.csihq.com/~mike  My home page
FAX 321-676-2355
- Original Message -
From: Robert Kleemann [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, June 14, 2001 11:50 PM
Subject: Re: Client receives TCP packets but does not ACK


A couple people have requested a test case.

The problem first showed up in a very large java app.  Since then I
wrote a small perl program to duplicate the behavior of the large app
by sending the same data, in the same order, in the same sized blocks,
from the server to the client.

If you want to test this on your configuration then download the
following two files:
http://www.kleemann.org/crap/clientserver
http://www.kleemann.org/crap/log1e1.txt

Place a copy of the files in the same directory on both the client and
the server and run the program the following way:

[server]$ ./clientserver -s log1e1.txt
listening on port 20001

[client]$ ./clientserver -c serverhostname log1e1.txt

The server will attempt to send the data to the client which then
verifies each byte received.

My client generally stops ack-ing between messages 15 and 25.

Robert.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



RE: Client receives TCP packets but does not ACK

2001-06-15 Thread Heusden, Folkert van

 TCP is NOT a guaranteed protocol -- you can't just blast data from one
port
 to another and expect it to work.

Isn't it? Are you really sure about that? I thought UDP was the
not-guaranteed-one and TCP was the one guaranting that all data reaches the
other end in order and all. Please enlighten me.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-15 Thread Mike Black

This is a very common misconception -- I worked a contract many years ago
where I actually had to quote the author of TCP to convince a banking
company I was working with that TCP is not a guaranteed protocol.
Guaranteed delivery at layer 5 - yes -- but NOT a guaranteed protcol.

Guaranteed means that there is absolutely NO way that data can be dropped by
an application if either sender or receiver screws up.

The only way to do this is at layer 7 of the OSI model -- even then you end
up making assumptions.

Here's some examples for layer 5 (which TCP operates at) but talking at
Layer 7:

#1 - You send() data -- meanwhile the receiver terminates the connection --
what happened to the data?  It's gone!  Your app never receives feedback
that it didn't send() correctly.  You'll see the reset on the next read but
you don't know what happened to the data.
#2 - You send() data and overrun your IP queue -- nobody will ever know the
difference without a layer 7 protocol (or int the case quoted in this
subject it might lock up).
#3 - You send() data and either machine has bad RAM and flips a bit -- guess
what? -- data corruption.

Even when you do layer 7 (with checksums and ack/nak) you make assumptions:

#1 - You checksum the packet you just received -- what's to say a bit can't
flip?

TCP may be guaranteed at layer 5 but we don't typically program at layer
5 -- we program at layer 7 and then lots of people assume they're doing it
at layer 5 -- ergo the problems.

To look at it another way -- Just 'cuz I told my C library to send a packet
doesn't mean it's going to work.
For example, if you're using non-blocking sockets you have to check to
ensure there's room in your IP queue to transmit.

TCP is guaranteed delivery at layer 5 -- but that's all -- not a guaranteed
protocol

Michael D. Black   Principal Engineer
[EMAIL PROTECTED]  321-676-2923,x203
http://www.csihq.com  Computer Science Innovations
http://www.csihq.com/~mike  My home page
FAX 321-676-2355
- Original Message -
From: Heusden, Folkert van [EMAIL PROTECTED]
To: Mike Black [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Friday, June 15, 2001 8:53 AM
Subject: RE: Client receives TCP packets but does not ACK


 TCP is NOT a guaranteed protocol -- you can't just blast data from one
port
 to another and expect it to work.

Isn't it? Are you really sure about that? I thought UDP was the
not-guaranteed-one and TCP was the one guaranting that all data reaches the
other end in order and all. Please enlighten me.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-15 Thread Albert D. Cahalan

Mike Black writes:

 I'm concerned that you're probably just overruning your IP stack:
...
 TCP is NOT a guaranteed protocol -- you can't just blast data from one port
 to another and expect it to work.

Yes you can. This is why we have TCP in fact.

 a tcp-write is NOT guaranteed -- and as you've seen -- a recv() isn't either
 (that's why you need timeouts).
 You're probably overrunning the tcp buffer on your print statement and
 truncating a block.
 I don't see where you're checking forEAGAIN or EWOULDBLOCK (see man
 send).

You do have to check for partial writes due to the UNIX API.
Then check for EAGAIN and EINTR at least.

 You need a layer-7 protocol that will guarantee your transactions -- once
 you're client acks/naks your server I'll bet everything works hunky-dory.
 If you're not familiar with the OSI model
 http://www.csihq.com/~mike/students/networking/iso/isomodel.html

You don't need that crap. TCP/IP doesn't even fit the OSI model,
and we're missing much of the OSI stack AFAIK. (Do we have that
thing with 10-byte addresses? I think not.)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-15 Thread Alan Cox

 TCP is guaranteed delivery at layer 5 -- but that's all -- not a guaranteed
 protocol

For certain specific cases this is in itself not true either. Also for many
many implementations.

Specifically
1.  If the receiver closes and there is unread data many TCP's forget
to RST the sender to indicate that data was lost.

2.  There is a flaw in the TCP protocol itself that is extremely unlikely
to bite people but can in theory cause wrong data in some unusual
circumstances that Ian Heavans found and has yet to be fixed by
the keepers of the protocol.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-15 Thread Gérard Roudier



On Fri, 15 Jun 2001, Mike Black wrote:

 This is a very common misconception -- I worked a contract many years ago
 where I actually had to quote the author of TCP to convince a banking
 company I was working with that TCP is not a guaranteed protocol.
 Guaranteed delivery at layer 5 - yes -- but NOT a guaranteed protcol.
 
 Guaranteed means that there is absolutely NO way that data can be dropped by
 an application if either sender or receiver screws up.
 
 The only way to do this is at layer 7 of the OSI model -- even then you end
 up making assumptions.

You are mixing oranges (protocols) and apples (implementations and APIs)
here.

The layer that is expected to provide reliable end to end communication is
layer 4 (transport layer). TCP, at least in theory, is as good as OSI
transport in providing reliable end to end communication. 

 Here's some examples for layer 5 (which TCP operates at) but talking at
 Layer 7:
 
 #1 - You send() data -- meanwhile the receiver terminates the connection --
 what happened to the data?  It's gone!  Your app never receives feedback
 that it didn't send() correctly.  You'll see the reset on the next read but
 you don't know what happened to the data.
 #2 - You send() data and overrun your IP queue -- nobody will ever know the
 difference without a layer 7 protocol (or int the case quoted in this
 subject it might lock up).
 #3 - You send() data and either machine has bad RAM and flips a bit -- guess
 what? -- data corruption.
 
 Even when you do layer 7 (with checksums and ack/nak) you make assumptions:
 
 #1 - You checksum the packet you just received -- what's to say a bit can't
 flip?
 
 TCP may be guaranteed at layer 5 but we don't typically program at layer
 5 -- we program at layer 7 and then lots of people assume they're doing it
 at layer 5 -- ergo the problems.

Layers above layer 4 provide additionnal services for applications but
they assume that layer 4 is reliable. In other words, a broken transport
layer breaks all layers above it and thus the applications.

In fact, when you build your application above layer 4 and need services
normally provided by upper OSI layers, you have to implement equivalent
services in your application, using layered protocols or not.

 To look at it another way -- Just 'cuz I told my C library to send a packet
 doesn't mean it's going to work.
 For example, if you're using non-blocking sockets you have to check to
 ensure there's room in your IP queue to transmit.

That's API semantic issue, not protocol issue.

  Gérard.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-15 Thread Robert Kleemann

First a little more data on the problem.

1) I've never seen it with the client and server program on the same
   computer.

2) It only repros on some systems.  If I can repro it on some systems
   and then make the client the server and the server the client the
   bug will often fail to repro.

3) Once it repros it _consistently_ repros seemingly independent of
   kernel versions.

Second, I really don't think it is a problem with the server.  Running
the test a few times I get the following behavior. The client reads up
to message #19 before failing to ack, the server program keeps
executing prints until the send buffer fills and then blocks on the
next print at message #25.  netstat -t on the server shows the
following

tcp0   8688 manny-out:20001 glottis:33193   ESTABLISHED

If I run the test a few more times, the client will block on a
different message (6,5,10,etc) and the server program will continue to
fill the buffer with the same amount of data before blocking.

What is more interesting to me is that if I add print statements
before and after the client recv call then it appears that the
client is blocking within the receive call.  If I packet sniff the
client then I see lots of packets from the server with happy acks from
the client until one packet does not receive an ack.  The server then
does the right thing and resends the non-acked packet at increasing
time intervals.  Eventually the client sends an ack for the previously
received packet (not the most recent that is being resent.  This
exchange continues indefinitely.  I've appended a commented packet log
to the end of this email.

So the client app is in the recv call waiting for data and the
interface is receiving the packet that should satisfy that block.  Why
would the client not be sending an ack?  If the buffer is full then
shouldn't the thread be returning from the recv call and releasing
some of that data?

One person emailed me a possible gotcha is that SIGINT is being
triggered somehow by the system call but I added a trap to the
original java app and now to the little perl script and it seems to
never be triggered.

Any other ideas for areas to investigate?

Robert.

Here are the logs.

packet A sent
20:15:21.703718  manny.20001  glottis-in.33088: P 14729:14984(255) ack 1 win 5792 
nop,nop,timestamp 925740 16355479 (DF)

packet B sent
20:15:21.713719  manny.20001  glottis-in.33088: . 14984:16432(1448) ack 1 win 5792 
nop,nop,timestamp 925741 16355479 (DF)

ack up to packet B
20:15:21.713719  glottis-in.33088  manny.20001: . 1:1(0) ack 16432 win 34752 
nop,nop,timestamp 16355480 925740 (DF)

packet C sent
20:15:21.713719  manny.20001  glottis-in.33088: P 16432:16714(282) ack 1 win 5792 
nop,nop,timestamp 925741 16355480 (DF)

packet D sent
20:15:21.713719  manny.20001  glottis-in.33088: . 16714:18162(1448) ack 1 win 5792 
nop,nop,timestamp 925742 16355480 (DF)

ack up to packet D
20:15:21.713719  glottis-in.33088  manny.20001: . 1:1(0) ack 18162 win 37648 
nop,nop,timestamp 16355480 925741 (DF)

packet E sent
20:15:21.723719  manny.20001  glottis-in.33088: P 18162:18420(258) ack 1 win 5792 
nop,nop,timestamp 925742 16355480 (DF)

packet F sent
20:15:21.733719  manny.20001  glottis-in.33088: . 18420:19868(1448) ack 1 win 5792 
nop,nop,timestamp 925743 16355480 (DF)

packet G sent
20:15:21.743719  manny.20001  glottis-in.33088: . 19868:21316(1448) ack 1 win 5792 
nop,nop,timestamp 925744 16355480 (DF)

ack up to packet E
20:15:21.743719  glottis-in.33088  manny.20001: . 1:1(0) ack 18420 win 37648 
nop,nop,timestamp 16355483 925742,nop,nop, sack 1 {19868:21316}  (DF)

packet H sent
20:15:21.743719  manny.20001  glottis-in.33088: P 21316:22139(823) ack 1 win 5792 
nop,nop,timestamp 925744 16355483 (DF)

ack up to packet E
20:15:21.743719  glottis-in.33088  manny.20001: . 1:1(0) ack 18420 win 37648 
nop,nop,timestamp 16355483 925742,nop,nop, sack 1 {19868:22139}  (DF)

packet I sent
20:15:21.763720  manny.20001  glottis-in.33088: . 22139:23587(1448) ack 1 win 5792 
nop,nop,timestamp 925746 16355483 (DF)

ack up to packet E
20:15:21.763720  glottis-in.33088  manny.20001: . 1:1(0) ack 18420 win 37648 
nop,nop,timestamp 16355485 925742,nop,nop, sack 1 {19868:23587}  (DF)

resend packet F many times in increasing time intervals. Why does the
client not ack this?
20:15:21.763720  manny.20001  glottis-in.33088: . 18420:19868(1448) ack 1 win 5792 
nop,nop,timestamp 925746 16355485 (DF)
20:15:21.973726  manny.20001  glottis-in.33088: . 18420:19868(1448) ack 1 win 5792 
nop,nop,timestamp 925768 16355485 (DF)
20:15:22.413738  manny.20001  glottis-in.33088: . 18420:19868(1448) ack 1 win 5792 
nop,nop,timestamp 925812 16355485 (DF)
20:15:23.293761  manny.20001  glottis-in.33088: . 18420:19868(1448) ack 1 win 5792 
nop,nop,timestamp 925900 16355485 (DF)
20:15:25.053809  manny.20001  glottis-in.33088: . 18420:19868(1448) ack 1 win 5792 
nop,nop,timestamp 926076 16355485 (DF)
20:15:28.573905  manny.20001  glottis-in.33088: 

Re: Client receives TCP packets but does not ACK

2001-06-14 Thread Robert Kleemann

A couple people have requested a test case.

The problem first showed up in a very large java app.  Since then I
wrote a small perl program to duplicate the behavior of the large app
by sending the same data, in the same order, in the same sized blocks,
from the server to the client.

If you want to test this on your configuration then download the
following two files:
http://www.kleemann.org/crap/clientserver
http://www.kleemann.org/crap/log1e1.txt

Place a copy of the files in the same directory on both the client and
the server and run the program the following way:

[server]$ ./clientserver -s log1e1.txt
listening on port 20001

[client]$ ./clientserver -c serverhostname log1e1.txt

The server will attempt to send the data to the client which then
verifies each byte received.

My client generally stops ack-ing between messages 15 and 25.

Robert.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-14 Thread Robert Kleemann

A couple people have requested a test case.

The problem first showed up in a very large java app.  Since then I
wrote a small perl program to duplicate the behavior of the large app
by sending the same data, in the same order, in the same sized blocks,
from the server to the client.

If you want to test this on your configuration then download the
following two files:
http://www.kleemann.org/crap/clientserver
http://www.kleemann.org/crap/log1e1.txt

Place a copy of the files in the same directory on both the client and
the server and run the program the following way:

[server]$ ./clientserver -s log1e1.txt
listening on port 20001

[client]$ ./clientserver -c serverhostname log1e1.txt

The server will attempt to send the data to the client which then
verifies each byte received.

My client generally stops ack-ing between messages 15 and 25.

Robert.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-13 Thread Robert Kleemann

On 13 Jun 2001, Andi Kleen wrote:
> The packet likely doesn't fit into the socket buffer and is silently
> dropped. The TCP stack doesn't force an ACK in this case, but it
> probably should, although it wouldn't solve the deadlock. The deadlock
> will be only solved if the local application reads data and clears the
> socket buffer. If you have a single packet that is bigger than the
> empty socket buffer / 2 you lose.
>
> You can check the allocated socket buffer size using netstat.

Thanks for the quick response!

I tried most of the netstat options and was unable to see the buffer size.
I do see the Recv-Q and the Send-Q which are usually zero except when the
client stops ack-ing and then the server's Send-Q starts filling up.

> You can increase it using the /proc/sys/net/core/rmem_{default,max}
> sysctls; in 2.4 there is also a TCP memory limit that can be tuned
> using /proc/sys/net/ipv4/tcp_mem. Doubling one of these will probably
> fix your problems.

On the client:
/proc/sys/net/core/rmem_default = 65535
/proc/sys/net/core/rmem_max = 65535
/proc/sys/net/ipv4/tcp_mem = 48128  48640   49152

On the server:
/proc/sys/net/core/rmem_default = 65535
/proc/sys/net/core/rmem_max = 65535
/proc/sys/net/ipv4/tcp_mem = 23552  24064   24576

The "bad" packet that seems to cause all the problems is only 1448
bytes long so I don't think insufficient buffers is the problem.
After the client stops ack-ing I can watch the server's Send-Q slowly
rise 2K, 4K, 6K, but it never comes close to these buffer limits.

Robert.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-13 Thread Andi Kleen

Robert Kleemann <[EMAIL PROTECTED]> writes:

> I have a client server program that opens a tcp connection between two
> machines.  Everything is fine until a certain type of data is sent
> across the socket at which point the client refuses to ACK and the
> server continues to resend the packets to no avail.
> 
> I've verified that the client is blocking on a socket read (and not
> coming out) I've also run "tcpdump -lxa -s 5000" on each machine and
> verified that each packet sent by each machine is received by the
> other.  I diffed the data and there appears to be no corruption.
> 
> I first saw this with the server running 2.4.2 and the client running
> 2.2.16 but I have since upgraded the server first to 2.4.5 and then
> also added a patch from 1.4.6-pre2 that had to do with tcp acks.  The
> bug still repros.  I have also upgraded the client to 2.4.2, 2.4.5,
> and 2.4.5 + ack patch with no luck.
> 
> There have been quite a few other people who have experienced these
> symptoms and posted to the list over the past 5 months or so.  I
> haven't seen a resolution for any of them except for requests to try
> the latest kernel since there have been a lot of networking fixes in
> the latest kernels.  I have appened links to these other postings at
> the end of this email in case their data might help.
> 
> I can consistently reproduce this problem on my machines (10mbs
> ethernet lan) and would really like to narrow this bug down to the
> source instead of trying the latest kernels and hoping that they solve
> the problem. The networking code (net/ipv4/tcp*.c) is daunting to me
> but if someone has any suggestions on good places to add debug code,
> building a debug version, or whatever, I can try it on my local system
> and investigate further.  This bug is driving me crazy and I want to
> find it and fix it!
> 
> Are there any other details that would help?  My hardware
> configuration? Network settings? etc?

The packet likely doesn't fit into the socket buffer and is silently 
dropped. The TCP stack doesn't force an ACK in this case, but it 
probably should, although it wouldn't solve the deadlock. The deadlock
will be only solved if the local application reads data and clears the
socket buffer. If you have a single packet that is bigger than the
empty socket buffer / 2 you lose.

You can check the allocated socket buffer size using netstat.  

You can increase it using the /proc/sys/net/core/rmem_{default,max} 
sysctls; in 2.4 there is also a TCP memory limit that can be tuned
using /proc/sys/net/ipv4/tcp_mem. Doubling one of these will probably
fix your problems. 

Normally the socket buffer should not overflow if the sender honors 
the TCP window protocol, but there are some corner cases where it can
still happen, e.g. when the application sends lots of small packets
(which all have fixed metadata overhead) or the device driver always
hands full MTU sized packets to the stack.

2.4.4+ has some fixes that should make these corner cases less likely.
It cannot be completely solved unfortunately.


-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-13 Thread Andi Kleen

Robert Kleemann [EMAIL PROTECTED] writes:

 I have a client server program that opens a tcp connection between two
 machines.  Everything is fine until a certain type of data is sent
 across the socket at which point the client refuses to ACK and the
 server continues to resend the packets to no avail.
 
 I've verified that the client is blocking on a socket read (and not
 coming out) I've also run tcpdump -lxa -s 5000 on each machine and
 verified that each packet sent by each machine is received by the
 other.  I diffed the data and there appears to be no corruption.
 
 I first saw this with the server running 2.4.2 and the client running
 2.2.16 but I have since upgraded the server first to 2.4.5 and then
 also added a patch from 1.4.6-pre2 that had to do with tcp acks.  The
 bug still repros.  I have also upgraded the client to 2.4.2, 2.4.5,
 and 2.4.5 + ack patch with no luck.
 
 There have been quite a few other people who have experienced these
 symptoms and posted to the list over the past 5 months or so.  I
 haven't seen a resolution for any of them except for requests to try
 the latest kernel since there have been a lot of networking fixes in
 the latest kernels.  I have appened links to these other postings at
 the end of this email in case their data might help.
 
 I can consistently reproduce this problem on my machines (10mbs
 ethernet lan) and would really like to narrow this bug down to the
 source instead of trying the latest kernels and hoping that they solve
 the problem. The networking code (net/ipv4/tcp*.c) is daunting to me
 but if someone has any suggestions on good places to add debug code,
 building a debug version, or whatever, I can try it on my local system
 and investigate further.  This bug is driving me crazy and I want to
 find it and fix it!
 
 Are there any other details that would help?  My hardware
 configuration? Network settings? etc?

The packet likely doesn't fit into the socket buffer and is silently 
dropped. The TCP stack doesn't force an ACK in this case, but it 
probably should, although it wouldn't solve the deadlock. The deadlock
will be only solved if the local application reads data and clears the
socket buffer. If you have a single packet that is bigger than the
empty socket buffer / 2 you lose.

You can check the allocated socket buffer size using netstat.  

You can increase it using the /proc/sys/net/core/rmem_{default,max} 
sysctls; in 2.4 there is also a TCP memory limit that can be tuned
using /proc/sys/net/ipv4/tcp_mem. Doubling one of these will probably
fix your problems. 

Normally the socket buffer should not overflow if the sender honors 
the TCP window protocol, but there are some corner cases where it can
still happen, e.g. when the application sends lots of small packets
(which all have fixed metadata overhead) or the device driver always
hands full MTU sized packets to the stack.

2.4.4+ has some fixes that should make these corner cases less likely.
It cannot be completely solved unfortunately.


-Andi

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-13 Thread Robert Kleemann

On 13 Jun 2001, Andi Kleen wrote:
 The packet likely doesn't fit into the socket buffer and is silently
 dropped. The TCP stack doesn't force an ACK in this case, but it
 probably should, although it wouldn't solve the deadlock. The deadlock
 will be only solved if the local application reads data and clears the
 socket buffer. If you have a single packet that is bigger than the
 empty socket buffer / 2 you lose.

 You can check the allocated socket buffer size using netstat.

Thanks for the quick response!

I tried most of the netstat options and was unable to see the buffer size.
I do see the Recv-Q and the Send-Q which are usually zero except when the
client stops ack-ing and then the server's Send-Q starts filling up.

 You can increase it using the /proc/sys/net/core/rmem_{default,max}
 sysctls; in 2.4 there is also a TCP memory limit that can be tuned
 using /proc/sys/net/ipv4/tcp_mem. Doubling one of these will probably
 fix your problems.

On the client:
/proc/sys/net/core/rmem_default = 65535
/proc/sys/net/core/rmem_max = 65535
/proc/sys/net/ipv4/tcp_mem = 48128  48640   49152

On the server:
/proc/sys/net/core/rmem_default = 65535
/proc/sys/net/core/rmem_max = 65535
/proc/sys/net/ipv4/tcp_mem = 23552  24064   24576

The bad packet that seems to cause all the problems is only 1448
bytes long so I don't think insufficient buffers is the problem.
After the client stops ack-ing I can watch the server's Send-Q slowly
rise 2K, 4K, 6K, but it never comes close to these buffer limits.

Robert.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Client receives TCP packets but does not ACK

2001-06-12 Thread Robert Kleemann

I have a client server program that opens a tcp connection between two
machines.  Everything is fine until a certain type of data is sent
across the socket at which point the client refuses to ACK and the
server continues to resend the packets to no avail.

I've verified that the client is blocking on a socket read (and not
coming out) I've also run "tcpdump -lxa -s 5000" on each machine and
verified that each packet sent by each machine is received by the
other.  I diffed the data and there appears to be no corruption.

I first saw this with the server running 2.4.2 and the client running
2.2.16 but I have since upgraded the server first to 2.4.5 and then
also added a patch from 1.4.6-pre2 that had to do with tcp acks.  The
bug still repros.  I have also upgraded the client to 2.4.2, 2.4.5,
and 2.4.5 + ack patch with no luck.

There have been quite a few other people who have experienced these
symptoms and posted to the list over the past 5 months or so.  I
haven't seen a resolution for any of them except for requests to try
the latest kernel since there have been a lot of networking fixes in
the latest kernels.  I have appened links to these other postings at
the end of this email in case their data might help.

I can consistently reproduce this problem on my machines (10mbs
ethernet lan) and would really like to narrow this bug down to the
source instead of trying the latest kernels and hoping that they solve
the problem. The networking code (net/ipv4/tcp*.c) is daunting to me
but if someone has any suggestions on good places to add debug code,
building a debug version, or whatever, I can try it on my local system
and investigate further.  This bug is driving me crazy and I want to
find it and fix it!

Are there any other details that would help?  My hardware
configuration? Network settings? etc?

Here is the analysis of one of the tcpdump logs for glottis.  glottis
is the client and manny is the server.  Note that the large packet
11006:1254(1448) is received by glottis and an ack is never sent to
manny.

20:07:45.043640 glottis->manny ack 11006
20:07:45.047120 manny->glottis 11006:12454(1448) ack 408 probably contains the 
remainder of ClientMap
20:07:45.047571 manny->glottis 12454:12936(482) ack 408
20:07:45.047673 glottis->manny ack 11006
20:07:45.272042 manny->glottis 11006:12454(1448) ack 408 resend
20:07:45.732049 manny->glottis 11006:12454(1448) ack 408 resend
20:07:46.652015 manny->glottis 11006:12454(1448) ack 408 resend
20:07:48.491986 manny->glottis 11006:12454(1448) ack 408 resend
20:07:52.171937 manny->glottis 11006:12454(1448) ack 408 resend
20:07:59.531850 manny->glottis 11006:12454(1448) ack 408 resend
web packets as manny is probably pinging session server
20:08:14.251656 manny->glottis 11006:12454(1448) ack 408 resend
20:08:24.078088 glottis->manny 408:437(29) ack 11006 text request in same packet
20:08:24.110417 manny->glottis ack 437
20:08:27.539778 glottis->manny 437:470(33) ack 11006 quit message
20:08:27.540158 manny->glottis 12936:12936(0) ack 470
20:08:27.541574 glottis->manny 470:472(2) ack 11006
20:08:27.542069 manny->glottis 12936:12936(0) ack 472
20:08:27.637385 manny->glottis 12936:12936(0) ack 473
web packets
ntp packets
20:08:43.691285 manny->glottis 11006:12454(1448) ack 473 resend
arp packets

Here are some other threads on the list that may be related to this problem:

http://groups.google.com/groups?hl=en==off=1=ca50bd5b6fab99dd,2=linux.kernel.3A806260.BB77D017%40denise.shiny.it#p

http://groups.google.com/groups?hl=en==off=1=c2b75d883be146f6,2=linux.kernel.5.0.2.1.2.20010115152847.00a8a380%40pop.we.mediaone.net#p

http://groups.google.com/groups?hl=en==off=1=5a94424eaed764df,21=linux.kernel.3A6F3C4A.27E148E9%40colorfullife.com#p

http://groups.google.com/groups?hl=en==off=1=d74b104bfe2da967,14=200104101738.VAA21467%40ms2.inr.ac.ru#p

http://groups.google.com/groups?hl=en==off=1=c15161c8342be0a0,7=linux.kernel.Pine.LNX.4.30.0012311601410.9994-10%40shodan.irccrew.org#p

http://groups.google.com/groups?hl=en==off=1=7268b77eb1e07a38,3=20010419200905.A2970%40ping.be#p

http://groups.google.com/groups?hl=en==off=1=160b098279e28ca9,8=linux.kernel.F57chplw8IfbyyOxmQp000170f7%40hotmail.com#p

Please cc me on any replies.

thanx!
Robert

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Client receives TCP packets but does not ACK

2001-06-12 Thread Robert Kleemann

I have a client server program that opens a tcp connection between two
machines.  Everything is fine until a certain type of data is sent
across the socket at which point the client refuses to ACK and the
server continues to resend the packets to no avail.

I've verified that the client is blocking on a socket read (and not
coming out) I've also run tcpdump -lxa -s 5000 on each machine and
verified that each packet sent by each machine is received by the
other.  I diffed the data and there appears to be no corruption.

I first saw this with the server running 2.4.2 and the client running
2.2.16 but I have since upgraded the server first to 2.4.5 and then
also added a patch from 1.4.6-pre2 that had to do with tcp acks.  The
bug still repros.  I have also upgraded the client to 2.4.2, 2.4.5,
and 2.4.5 + ack patch with no luck.

There have been quite a few other people who have experienced these
symptoms and posted to the list over the past 5 months or so.  I
haven't seen a resolution for any of them except for requests to try
the latest kernel since there have been a lot of networking fixes in
the latest kernels.  I have appened links to these other postings at
the end of this email in case their data might help.

I can consistently reproduce this problem on my machines (10mbs
ethernet lan) and would really like to narrow this bug down to the
source instead of trying the latest kernels and hoping that they solve
the problem. The networking code (net/ipv4/tcp*.c) is daunting to me
but if someone has any suggestions on good places to add debug code,
building a debug version, or whatever, I can try it on my local system
and investigate further.  This bug is driving me crazy and I want to
find it and fix it!

Are there any other details that would help?  My hardware
configuration? Network settings? etc?

Here is the analysis of one of the tcpdump logs for glottis.  glottis
is the client and manny is the server.  Note that the large packet
11006:1254(1448) is received by glottis and an ack is never sent to
manny.

20:07:45.043640 glottis-manny ack 11006
20:07:45.047120 manny-glottis 11006:12454(1448) ack 408 probably contains the 
remainder of ClientMap
20:07:45.047571 manny-glottis 12454:12936(482) ack 408
20:07:45.047673 glottis-manny ack 11006
20:07:45.272042 manny-glottis 11006:12454(1448) ack 408 resend
20:07:45.732049 manny-glottis 11006:12454(1448) ack 408 resend
20:07:46.652015 manny-glottis 11006:12454(1448) ack 408 resend
20:07:48.491986 manny-glottis 11006:12454(1448) ack 408 resend
20:07:52.171937 manny-glottis 11006:12454(1448) ack 408 resend
20:07:59.531850 manny-glottis 11006:12454(1448) ack 408 resend
web packets as manny is probably pinging session server
20:08:14.251656 manny-glottis 11006:12454(1448) ack 408 resend
20:08:24.078088 glottis-manny 408:437(29) ack 11006 text request in same packet
20:08:24.110417 manny-glottis ack 437
20:08:27.539778 glottis-manny 437:470(33) ack 11006 quit message
20:08:27.540158 manny-glottis 12936:12936(0) ack 470
20:08:27.541574 glottis-manny 470:472(2) ack 11006
20:08:27.542069 manny-glottis 12936:12936(0) ack 472
20:08:27.637385 manny-glottis 12936:12936(0) ack 473
web packets
ntp packets
20:08:43.691285 manny-glottis 11006:12454(1448) ack 473 resend
arp packets

Here are some other threads on the list that may be related to this problem:

http://groups.google.com/groups?hl=enlr=safe=offic=1th=ca50bd5b6fab99dd,2seekm=linux.kernel.3A806260.BB77D017%40denise.shiny.it#p

http://groups.google.com/groups?hl=enlr=safe=offic=1th=c2b75d883be146f6,2seekm=linux.kernel.5.0.2.1.2.20010115152847.00a8a380%40pop.we.mediaone.net#p

http://groups.google.com/groups?hl=enlr=safe=offic=1th=5a94424eaed764df,21seekm=linux.kernel.3A6F3C4A.27E148E9%40colorfullife.com#p

http://groups.google.com/groups?hl=enlr=safe=offic=1th=d74b104bfe2da967,14seekm=200104101738.VAA21467%40ms2.inr.ac.ru#p

http://groups.google.com/groups?hl=enlr=safe=offic=1th=c15161c8342be0a0,7seekm=linux.kernel.Pine.LNX.4.30.0012311601410.9994-10%40shodan.irccrew.org#p

http://groups.google.com/groups?hl=enlr=safe=offic=1th=7268b77eb1e07a38,3seekm=20010419200905.A2970%40ping.be#p

http://groups.google.com/groups?hl=enlr=safe=offic=1th=160b098279e28ca9,8seekm=linux.kernel.F57chplw8IfbyyOxmQp000170f7%40hotmail.com#p

Please cc me on any replies.

thanx!
Robert

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/