John, Yes, client nodes should have this parameter too.
Evgenii пн, 11 мая 2020 г. в 07:54, John Smith <[email protected]>: > I mean both the prefer IPV4 and the Zookeeper discovery should be on the > "central" cluster as well as all nodes specifically marked as client = true? > > On Mon, 11 May 2020 at 09:59, John Smith <[email protected]> wrote: > >> Should be on client nodes as well that are specifically setClient = true? >> >> On Fri, 8 May 2020 at 22:26, Evgenii Zhuravlev <[email protected]> >> wrote: >> >>> John, >>> >>> It looks like a split-brain. They were in one cluster at first. I'm not >>> sure what was the reason for this, it could be a network problem or >>> something else. >>> >>> I saw in logs that you use both ipv4 and ipv6, I would recommend using >>> only one of them to avoid problems - just add >>> -Djava.net.preferIPv4Stack=true >>> to all nodes in the cluster. >>> >>> Also, to avoid split-brain situations, you can use Zookeeper Discovery: >>> https://apacheignite.readme.io/docs/zookeeper-discovery#failures-and-split-brain-handling >>> or >>> implement Segmentation resolver. More information about the second can be >>> found on the forum, for example, here: >>> http://apache-ignite-users.70518.x6.nabble.com/split-brain-problem-and-GridSegmentationProcessor-td14590.html >>> >>> Evgenii >>> >>> пт, 8 мая 2020 г. в 14:30, John Smith <[email protected]>: >>> >>>> How though? It's the same cluster! We haven't changed anything >>>> this happened on it's own... >>>> >>>> All I did was reboot the node and the cluster fixed itself. >>>> >>>> On Fri, 8 May 2020 at 15:32, Evgenii Zhuravlev < >>>> [email protected]> wrote: >>>> >>>>> Hi John, >>>>> >>>>> *Yes, it looks like they are in a different clusters:* >>>>> *Metrics from the node with a problem:* >>>>> [15:17:28,668][INFO][grid-timeout-worker-#23%xxxxxx%][IgniteKernal%xxxxxx] >>>>> >>>>> Metrics for local node (to disable set 'metricsLogFrequency' to 0) >>>>> ^-- Node [id=5bbf262e, name=xxxxxx, uptime=93 days, 19:36:10.921] >>>>> ^-- H/N/C [hosts=3, nodes=4, CPUs=10] >>>>> >>>>> *Metrics from another node:* >>>>> [15:17:05,635][INFO][grid-timeout-worker-#23%xxxxxx%][IgniteKernal%xxxxxx] >>>>> >>>>> Metrics for local node (to disable set 'metricsLogFrequency' to 0) >>>>> ^-- Node [id=dddefdcd, name=xxxxxx, uptime=19 days, 16:49:48.381] >>>>> ^-- H/N/C [hosts=6, nodes=7, CPUs=21] >>>>> >>>>> *The same topology versions for 2 nodes has different nodes:* >>>>> [03:56:17,643][INFO][disco-event-worker-#42%xxxxxx%][GridDiscoveryManager] >>>>> Topology snapshot [ver=1036, locNode=5bbf262e, servers=1, clients=3, >>>>> state=ACTIVE, CPUs=10, offheap=10.0GB, heap=13.0GB] >>>>> [03:56:17,643][INFO][disco-event-worker-#42%xxxxxx%][GridDiscoveryManager] >>>>> ^-- Baseline [id=0, size=3, online=1, offline=2] >>>>> >>>>> *And* >>>>> >>>>> [03:56:43,388][INFO][disco-event-worker-#42%xxxxxx%][GridDiscoveryManager] >>>>> Topology snapshot [ver=1036, locNode=4394fdd4, servers=2, clients=2, >>>>> state=ACTIVE, CPUs=15, offheap=20.0GB, heap=19.0GB] >>>>> [03:56:43,389][INFO][disco-event-worker-#42%xxxxxx%][GridDiscoveryManager] >>>>> ^-- Baseline [id=0, size=3, online=2, offline=1] >>>>> >>>>> So, it's just 2 different clusters. >>>>> >>>>> Best Regards, >>>>> Evgenii >>>>> >>>>> пт, 8 мая 2020 г. в 08:50, John Smith <[email protected]>: >>>>> >>>>>> Hi Evgenii, here the logs. >>>>>> >>>>>> https://www.dropbox.com/s/ke71qsoqg588kc8/ignite-logs.zip?dl=0 >>>>>> >>>>>> On Fri, 8 May 2020 at 09:21, John Smith <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Ok let me try get them... >>>>>>> >>>>>>> On Thu., May 7, 2020, 1:14 p.m. Evgenii Zhuravlev, < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> It looks like the third server node was not a part of this cluster >>>>>>>> before restart. Can you share full logs from all server nodes? >>>>>>>> >>>>>>>> Evgenii >>>>>>>> >>>>>>>> чт, 7 мая 2020 г. в 09:11, John Smith <[email protected]>: >>>>>>>> >>>>>>>>> Hi, running 2.7.0 on 3 deployed on VMs running Ubuntu. >>>>>>>>> >>>>>>>>> I checked the state of the cluster by going >>>>>>>>> to: /ignite?cmd=currentState >>>>>>>>> And the response was: >>>>>>>>> {"successStatus":0,"error":null,"sessionToken":null,"response":true} >>>>>>>>> I also checked: /ignite?cmd=size&cacheName=.... >>>>>>>>> >>>>>>>>> 2 nodes where reporting 3 million records >>>>>>>>> 1 node was reporting 2 million records. >>>>>>>>> >>>>>>>>> When I connected to visor and ran the node command... The details >>>>>>>>> where wrong as it only showed 2 server nodes and only 1 client, but 3 >>>>>>>>> server nodes actually exist and more clients are connected. >>>>>>>>> >>>>>>>>> So I rebooted the node that was claiming 2 million records instead >>>>>>>>> of 3 and when I re-ran the node command displayed all the proper >>>>>>>>> nodes. >>>>>>>>> Also after the reboot all the nodes started reporting 2 million >>>>>>>>> records instead of 3 million so there some sort of rebalancing or >>>>>>>>> correction (the cache has a 90 day TTL)? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Before reboot >>>>>>>>> >>>>>>>>> +=============================================================================================================================+ >>>>>>>>> | # | Node ID8(@), IP | Consistent ID >>>>>>>>> | Node Type | Up Time | CPUs | CPU Load | Free Heap | >>>>>>>>> >>>>>>>>> +=============================================================================================================================+ >>>>>>>>> | 0 | xxxxxx(@n0), xxxxxx.69 | xxxxxx | Server | 20:25:30 | 4 >>>>>>>>> | 1.27 % | 84.00 % | >>>>>>>>> | 1 | xxxxxx(@n1), xxxxxx.1 | xxxxxx | Client | 13:12:01 | 3 >>>>>>>>> | 0.67 % | 74.00 % | >>>>>>>>> | 2 | xxxxxx(@n2), xxxxxx.63 | xxxxxx | Server | 16:55:05 | 4 >>>>>>>>> | 6.57 % | 84.00 % | >>>>>>>>> >>>>>>>>> +-----------------------------------------------------------------------------------------------------------------------------+ >>>>>>>>> >>>>>>>>> After reboot >>>>>>>>> >>>>>>>>> +=============================================================================================================================+ >>>>>>>>> | # | Node ID8(@), IP | Consistent ID >>>>>>>>> | Node Type | Up Time | CPUs | CPU Load | Free Heap | >>>>>>>>> >>>>>>>>> +=============================================================================================================================+ >>>>>>>>> | 0 | xxxxxx(@n0), xxxxxx.69 | xxxxxx | Server | 21:13:45 | 4 >>>>>>>>> | 0.77 % | 56.00 % | >>>>>>>>> | 1 | xxxxxx(@n1), xxxxxx.1 | xxxxxx | Client | 14:00:17 | 3 >>>>>>>>> | 0.77 % | 56.00 % | >>>>>>>>> | 2 | xxxxxx(@n2), xxxxxx.63 | xxxxxx | Server | 17:43:20 | 4 >>>>>>>>> | 1.00 % | 60.00 % | >>>>>>>>> | 3 | xxxxxx(@n3), xxxxxx.65 | xxxxxx | Client | 01:42:45 | 4 >>>>>>>>> | 4.10 % | 56.00 % | >>>>>>>>> | 4 | xxxxxx(@n4), xxxxxx.65 | xxxxxx | Client | 01:42:45 | 4 >>>>>>>>> | 3.93 % | 56.00 % | >>>>>>>>> | 5 | xxxxxx(@n5), xxxxxx.1 | xxxxxx | Client | 16:59:53 | 2 >>>>>>>>> | 0.67 % | 91.00 % | >>>>>>>>> | 6 | xxxxxx(@n6), xxxxxx.79 | xxxxxx | Server | 00:41:31 | 4 >>>>>>>>> | 1.00 % | 97.00 % | >>>>>>>>> >>>>>>>>> +-----------------------------------------------------------------------------------------------------------------------------+ >>>>>>>>> >>>>>>>>
