Hi Val,

Thanks for explaining the difference between events. After some reading I kind 
of figured that I don’t need to check anything once I get segmented event. You 
mentioned that node is the topology gets node fail/left event, then why I get 
node left event on segmented node? Look at the event timeline I have shared in 
the first post.

The application processes huge amount of time series data. we get the data at 
set interval. The data processing has two stages. Once we get data, I 
distribute them for first stage processing based on a key. After the first 
stage, we redistribute the data for second stage based on different key. For 
both stages, we have bunch of other metadata which are also in the distributed 
caches. Some of them are small so I have replicated them on all nodes. Some of 
them are huge which are partitioned. These metadata do not change a lot.

The other issue I faced was that once one of the node gets segmented, the other 
node dies. The reason it dies is because heap usage jumped on the other node 
instantly. It jumped from 4GM to 11 GB. Very strange. Any idea what could cause 
this?

Thanks,
Biren

On 8/21/17, 6:53 PM, "vkulichenko" <[email protected]> wrote:

    Hi Biren,
    
    What is the use case and what are you trying to achieve by all this?
    
    First of all, there is a difference between node_left/failed and
    node_segmented events. The former is fired on nodes that are still in
    topology to notify that one of the nodes left or failed. But the latter
    means that *local* node got segmented, and I don't think it makes sense to
    do any checks there.
    
    Segmentation can happen for various reasons, but in vast majority of cases
    it's a long GC pause. In this case node does not close connections, but
    becomes unresponsive, which causes the cluster to remove it from topology
    after failure detection timeout. When GC pause finishes, node tries to
    continue to operate, but realizes that it was already kicked out. It then
    fires node_segmented event locally and stops immediately. This is correct
    behavior.
    
    -Val
    
    
    
    --
    View this message in context: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dignite-2Dusers.70518.x6.nabble.com_Cluster-2Dsegmentation-2Dtp16314p16351.html&d=DwICAg&c=Zok6nrOF6Fe0JtVEqKh3FEeUbToa1PtNBZf6G01cvEQ&r=rbkF1xy5tYmkV8VMdTRVaIVhaXCNGxmyTB5plfGtWuY&m=EsW9z3oSwgxZCeY4wMYDAG1DZSs6PrI_95QZkz5nrMk&s=137pm5de4sgSFexWBiXVHz-5keGP5OKYr-q74AlW5To&e=
    Sent from the Apache Ignite Users mailing list archive at Nabble.com.
    

Reply via email to