Sven:
Please be specific. What do you mean by "I see the same messages"?
Jeffrey Altman
Sven Oehme wrote:
Oh,
just reading the mail from the beginning
:-)
may be we have two bugs here, my bug
also reduce the performance , but i have no 50 sec delay, but i see the
same messages (hundreds of them) .
i start multiple batch jobs from 1 client
(different processes) to 1 server to 1 volume ..
rxdebug on the client if this helps somebody
.. :
testblade11:~ # rxdebug localhost 7001 -rxstats
-noconns -long
Trying 127.0.0.1 (port 7001):
Free packets: 129, packet reclaims: 0, calls:
55, used FDs: 64
not waiting for packets.
0 calls waiting for a thread
1 threads are idle
rx stats: free packets 129, allocs 452504,
alloc-failures(rcv 0/0,send 575/0,ack 0)
greedy 0, bogusReads 0 (last
from host 0), noPackets 0, noBuffers 0, selects 0, sendSelects 0
packets read: data 8585 ack
124336 busy 0 abort 0 ackall 0 challenge 53 response 0 debug 1420
params
0 unused 0 unused 0 unused 0 version 0
other read counters: data 8585,
ack 124002, dup 0 spurious 333 dally 1
packets sent: data 114805 ack
8529 busy 0 abort 0 ackall 0 challenge 0 response 53 debug 0 params 0
unused
0 unused 0 unused 0 version 0
other send counters: ack 8529,
data 870762 (not resends), resends 0, pushed 0, acked&ignored
340943
(these should
be small) sendFailed 0, fatalErrors 0
Average rtt is 0.001, with 26815
samples
Minimum rtt is 0.000, maximum
is 0.095
1 server connections, 29 client
connections, 2 peer structs, 47 call structs, 0 free call structs
Sven
Hi Jeffrey,
Peter and i work on that bug .. i have a test environment where i can
reproduce
the bug within 2 sec .
if anybody like to assist us i can provide a tcpdump while it happens ..
Sven
Harald Barth wrote:
> You probably mean stuff like this:
>
> Wed Oct 5 17:31:21 2005 FindClient: client 8320a78(6d5cb8f8)
already had conn a7071568 (host 3fdded82), stolen by client
8320a78(6d5cb8f8)
> I have only ONE such log line and not for the time frame in
question.
> 3fdded82 is my laptop 130.237.221.63 when at work. But I have no
such
> message for any of its other IPs which would be *eded82
(130.237.237.*)
> - my laptop when at home.
This log message is not a symptom of the bug that was fixed related to
UUID collision. This problem you are seeing may or may not be related
and it may or may not be an actual bug.
> I moved my H.haba.mail volume to another server which allows me to
gdb
> and stop the fileserver without been lynched but of course the
> problems dissapeared when I did that. Probably I need to use up
some
> kind of resource in the fileserver/rx first. I don't know how
without
> letting loose real users. I know I have many connections from many
> clients. But a lot of free threads and no CPU or I/O load to speek
of.
> Feel free to run rxdebug against houting.pdc.kth.se if you think
you
> see something that I don't. Any tips how to collect statistics?
>
> Harald.
I doubt moving your volume is going to help track down the problem.
You are not going to have lots of other users connecting to the new
server.
I don't think we need to be able to stop the service. However, it
would
be useful to see what the server is doing in Ethereal.
Jeffrey Altman
|