Re: [tcpdump-workers] RPC responde code question (print-nfs.c)

2007-09-12 Thread Alex Still
Thanks a lot for that detailed explanation.
Indeed wireshark dissected these correctly when using decode as.. RPC

Cheers,

On 9/5/07, Guy Harris [EMAIL PROTECTED] wrote:

 Alex Still wrote:

  We're using tcpdump to try to diagnose an NFS performance problem.
  We're seeing a lot of these :
  [..]
  16:01:19.890794 IP nfs_server.nfs  client_46.1205729087: reply ERR 1448
  16:01:19.890831 IP nfs_server.nfs  client_46.3664003007: reply ERR 1448
  [..]
 
  That seems to be an RPC error,

 It seems to be one, but it isn't necessarily one.

 The code to print that printed reply OK {length} if the field in the
 packet that, *IF* it contains the beginning of an ONC RPC reply, would
 be the reply code is 0, and reply ERR {length} if it's not zero.

 However, not all packets from an ONC RPC server (such as an NFS server)
 necessarily contain the beginning of an ONC RPC reply, as a reply (or
 request) can require more than one link-layer packet - for example, an
 NFS READ reply that returns more bytes that fit in one Ethernet packet
 will, when sent over Ethernet, take more than one Ethernet packet.


[..]

-- 
Alex
-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.


Re: [tcpdump-workers] RPC responde code question (print-nfs.c)

2007-09-05 Thread Guy Harris

Alex Still wrote:


We're using tcpdump to try to diagnose an NFS performance problem.
We're seeing a lot of these :
[..]
16:01:19.890794 IP nfs_server.nfs  client_46.1205729087: reply ERR 1448
16:01:19.890831 IP nfs_server.nfs  client_46.3664003007: reply ERR 1448
[..]

That seems to be an RPC error,


It seems to be one, but it isn't necessarily one.

The code to print that printed reply OK {length} if the field in the 
packet that, *IF* it contains the beginning of an ONC RPC reply, would 
be the reply code is 0, and reply ERR {length} if it's not zero.


However, not all packets from an ONC RPC server (such as an NFS server) 
necessarily contain the beginning of an ONC RPC reply, as a reply (or 
request) can require more than one link-layer packet - for example, an 
NFS READ reply that returns more bytes that fit in one Ethernet packet 
will, when sent over Ethernet, take more than one Ethernet packet.


If the request or reply is coming over UDP, that's handled by IP 
fragmentation; the first packet of the reply will be the first fragment, 
i.e. it will have a fragment offset of 0, and can be identified by 
looking at the IP header.  tcpdump only attempts to dissect the first 
fragment as anything other than IP (for one thing, subsequent fragments 
don't have UDP or TCP port numbers, so, unless it keeps track of the 
fragments, it has no idea what port number any fragment other than the 
first one was sent from or to), so it won't try to dissect the 
subsequent fragments as if they contained the beginning of an ONC RPC 
request or reply.


If, however, it's coming over TCP, that's handled by the RPC 
implementation, as TCP merely provides a byte stream, and has no idea 
what the message boundaries are for the protocols that run atop it.


Therefore, the current tcpdump code to dissect RPC requests and replies 
doesn't identify TCP segments that are in the middle of an ONC RPC 
request or reply, and tries to dissect them as if they were at the 
beginning of the request or reply.


This can cause packets to be mis-dissected; the apparent RPC errors 
you're seeing are probably examples of that.



That part of the code
has changed in tcpdump-3.9.7, which we installed, and now gives :

16:01:19.890794 IP nfs_server.nfs  client_46.1205729087: reply Unknown rpc
response code=3128327487 1448
16:01:19.890831 IP nfs_server.nfs  client_46.3664003007: reply Unknown rpc
response code=910276799 1448


It now prints the value of the field that, if the packet really *is* an 
RPC error, would be the error code (the code is derived from from 
NetBSD's version of tcpdump).  If the packet *isn't* an RPC error, the 
value at the location that's interpreted as a reply code has a very good 
chance of being bogus, as was the case here.



I don't understand what's happening here. Is it something wrong with our NFS
setup,


No.


or I'm misunderstanding the tcpdump output ?


Yes.

Wireshark/TShark should do a better job of this, for two reasons:

	1) it does reassembly of ONC RPC requests and replies over TCP, as well 
as handling TCP segments with multiple requests or replies;


	2) it does some heuristics to check whether a packet *is* an ONC RPC 
request or reply.


1) would probably be a major undertaking to do in tcpdump, but 2) could 
be done:


	for requests, check whether the RPC version field (not the NFS version) 
has the value 2 (neither Sun nor anybody else have released any version 
of ONC RPC other than 2), and check whether the program number in the 
request is the program number for NFS (note that Sun also run another 
RPC protocol on port 2049 - a protocol to allow access to POSIX ACLs 
over NFS - which will be misdissected if interpreted as NFS);


	for replies, check whether the transaction ID of the reply matches the 
transaction ID of a request we've already seen (tcpdump already does 
that matching for replies without errors, as it needs to know what type 
of request the reply is for).


I'll see whether that could be done.
-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.