Early on running on 2.3 we had hit a clear deadlock that I never root-caused,
where the cluster just stopped working.  At the time I was use the same
DataStreamer from multiple threads and we tuned up the buffer size because
of that, and we were running against EBS, and perhaps with too short
timeouts.     We have not seen this on 2.4 with a DataStreamer per producer
thread with default parameters against SSDs.   This problem seemed worse
when I paid attention to the Ignite startup message about needing to set a
message buffer/size limit, and specified one.

One thing still on my list, however, is to understand more about paired TCP
connections and why (whether) they are the default.      Fundamentally, if
you are sending bi-directional request/response pairs over a single TCP
virtual circuit, there is an inherent deadlock where responses may get
behind requests that are flow controlled.  With a single VC, the only
general solution to this is to assume unlimited memory, reading requests
from the VC and queuing them in memory, in order to be able to remove the
responses.  You can limit the memory usage on the receiver by limiting the
total requests that can be sent at a higher level, but as node count scales,
the receiver would need more memory.    I've been assuming that paired
connections is trying to address this fundamental issue to prevent requests
from blocking responses, but I haven't gotten there yet.  My impression was
that paired connections are not the default.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to