Hi Val, Thanks for explaining the difference between events. After some reading I kind of figured that I don’t need to check anything once I get segmented event. You mentioned that node is the topology gets node fail/left event, then why I get node left event on segmented node? Look at the event timeline I have shared in the first post.
The application processes huge amount of time series data. we get the data at set interval. The data processing has two stages. Once we get data, I distribute them for first stage processing based on a key. After the first stage, we redistribute the data for second stage based on different key. For both stages, we have bunch of other metadata which are also in the distributed caches. Some of them are small so I have replicated them on all nodes. Some of them are huge which are partitioned. These metadata do not change a lot. The other issue I faced was that once one of the node gets segmented, the other node dies. The reason it dies is because heap usage jumped on the other node instantly. It jumped from 4GM to 11 GB. Very strange. Any idea what could cause this? Thanks, Biren On 8/21/17, 6:53 PM, "vkulichenko" <[email protected]> wrote: Hi Biren, What is the use case and what are you trying to achieve by all this? First of all, there is a difference between node_left/failed and node_segmented events. The former is fired on nodes that are still in topology to notify that one of the nodes left or failed. But the latter means that *local* node got segmented, and I don't think it makes sense to do any checks there. Segmentation can happen for various reasons, but in vast majority of cases it's a long GC pause. In this case node does not close connections, but becomes unresponsive, which causes the cluster to remove it from topology after failure detection timeout. When GC pause finishes, node tries to continue to operate, but realizes that it was already kicked out. It then fires node_segmented event locally and stops immediately. This is correct behavior. -Val -- View this message in context: https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dignite-2Dusers.70518.x6.nabble.com_Cluster-2Dsegmentation-2Dtp16314p16351.html&d=DwICAg&c=Zok6nrOF6Fe0JtVEqKh3FEeUbToa1PtNBZf6G01cvEQ&r=rbkF1xy5tYmkV8VMdTRVaIVhaXCNGxmyTB5plfGtWuY&m=EsW9z3oSwgxZCeY4wMYDAG1DZSs6PrI_95QZkz5nrMk&s=137pm5de4sgSFexWBiXVHz-5keGP5OKYr-q74AlW5To&e= Sent from the Apache Ignite Users mailing list archive at Nabble.com.
