Oh,
just reading the mail from the beginning :-)
may be we have two bugs here, my bug also reduce the performance , but i have no 50 sec delay, but i see the same messages (hundreds of them) .
i start multiple batch jobs from 1 client (different processes) to 1 server to 1 volume ..
rxdebug on the client if this helps somebody .. :
testblade11:~ # rxdebug localhost 7001 -rxstats -noconns -long
Trying 127.0.0.1 (port 7001):
Free packets: 129, packet reclaims: 0, calls: 55, used FDs: 64
not waiting for packets.
0 calls waiting for a thread
1 threads are idle
rx stats: free packets 129, allocs 452504, alloc-failures(rcv 0/0,send 575/0,ack 0)
greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 0, selects 0, sendSelects 0
packets read: data 8585 ack 124336 busy 0 abort 0 ackall 0 challenge 53 response 0 debug 1420 params 0 unused 0 unused 0 unused 0 version 0
other read counters: data 8585, ack 124002, dup 0 spurious 333 dally 1
packets sent: data 114805 ack 8529 busy 0 abort 0 ackall 0 challenge 0 response 53 debug 0 params 0 unused 0 unused 0 unused 0 version 0
other send counters: ack 8529, data 870762 (not resends), resends 0, pushed 0, acked&ignored 340943
(these should be small) sendFailed 0, fatalErrors 0
Average rtt is 0.001, with 26815 samples
Minimum rtt is 0.000, maximum is 0.095
1 server connections, 29 client connections, 2 peer structs, 47 call structs, 0 free call structs
Sven
| Sven Oehme/Germany/[EMAIL PROTECTED]
Sent by: [EMAIL PROTECTED] 10/07/05 03:04 PM |
|
Hi Jeffrey,
Peter and i work on that bug .. i have a test environment where i can reproduce the bug within 2 sec .
if anybody like to assist us i can provide a tcpdump while it happens ..
Sven
| Jeffrey Altman <[EMAIL PROTECTED]>
Sent by: [EMAIL PROTECTED] 10/07/05 02:22 PM |
|
Harald Barth wrote:
> You probably mean stuff like this:
>
> Wed Oct 5 17:31:21 2005 FindClient: client 8320a78(6d5cb8f8) already had conn a7071568 (host 3fdded82), stolen by client 8320a78(6d5cb8f8)
> I have only ONE such log line and not for the time frame in question.
> 3fdded82 is my laptop 130.237.221.63 when at work. But I have no such
> message for any of its other IPs which would be *eded82 (130.237.237.*)
> - my laptop when at home.
This log message is not a symptom of the bug that was fixed related to
UUID collision. This problem you are seeing may or may not be related
and it may or may not be an actual bug.
> I moved my H.haba.mail volume to another server which allows me to gdb
> and stop the fileserver without been lynched but of course the
> problems dissapeared when I did that. Probably I need to use up some
> kind of resource in the fileserver/rx first. I don't know how without
> letting loose real users. I know I have many connections from many
> clients. But a lot of free threads and no CPU or I/O load to speek of.
> Feel free to run rxdebug against houting.pdc.kth.se if you think you
> see something that I don't. Any tips how to collect statistics?
>
> Harald.
I doubt moving your volume is going to help track down the problem.
You are not going to have lots of other users connecting to the new server.
I don't think we need to be able to stop the service. However, it would
be useful to see what the server is doing in Ethereal.
Jeffrey Altman
