I'm using TcpDiscovery with the nodes running at Amazon. So far the
networking inside the Amazon cloud has been rock solid. That said, over
the weekend I had two compute tasks fail because of an empty
projection. It looks like the compute nodes all disconnected.
Its unclear why exactly the nodes disconnected.
I'm thinking of increasing the number of missed heartbeats but I'd also
like to increase the amount of logging so that I have more information
to debug similar errors in the future.
I know that Ignite logging can be verbose - does anyone have suggestions
for how I'd want to configure the logging in order to capture clues to
diagnose the random disconnects but also not have a mountain of useless
logs?
Thanks!
Ryan