Re: Cache was inconsistent state

2020-05-08 Thread Evgenii Zhuravlev
John,

It looks like a split-brain. They were in one cluster at first. I'm not
sure what was the reason for this, it could be a network problem or
something else.

I saw in logs that you use both ipv4 and ipv6, I would recommend using
only one of them to avoid problems - just add -Djava.net.preferIPv4Stack=true
to all nodes in the cluster.

Also, to avoid split-brain situations, you can use Zookeeper Discovery:
https://apacheignite.readme.io/docs/zookeeper-discovery#failures-and-split-brain-handling
or
implement Segmentation resolver. More information about the second can be
found on the forum, for example, here:
http://apache-ignite-users.70518.x6.nabble.com/split-brain-problem-and-GridSegmentationProcessor-td14590.html

Evgenii

пт, 8 мая 2020 г. в 14:30, John Smith :

> How though? It's the same cluster! We haven't changed anything
> this happened on it's own...
>
> All I did was reboot the node and the cluster fixed itself.
>
> On Fri, 8 May 2020 at 15:32, Evgenii Zhuravlev 
> wrote:
>
>> Hi John,
>>
>> *Yes, it looks like they are in a different clusters:*
>> *Metrics from the node with a problem:*
>> [15:17:28,668][INFO][grid-timeout-worker-#23%xx%][IgniteKernal%xx]
>>
>> Metrics for local node (to disable set 'metricsLogFrequency' to 0)
>> ^-- Node [id=5bbf262e, name=xx, uptime=93 days, 19:36:10.921]
>> ^-- H/N/C [hosts=3, nodes=4, CPUs=10]
>>
>> *Metrics from another node:*
>> [15:17:05,635][INFO][grid-timeout-worker-#23%xx%][IgniteKernal%xx]
>>
>> Metrics for local node (to disable set 'metricsLogFrequency' to 0)
>> ^-- Node [id=dddefdcd, name=xx, uptime=19 days, 16:49:48.381]
>> ^-- H/N/C [hosts=6, nodes=7, CPUs=21]
>>
>> *The same topology versions for 2 nodes has different nodes:*
>> [03:56:17,643][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager]
>> Topology snapshot [ver=1036, locNode=5bbf262e, servers=1, clients=3,
>> state=ACTIVE, CPUs=10, offheap=10.0GB, heap=13.0GB]
>> [03:56:17,643][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager]
>>   ^-- Baseline [id=0, size=3, online=1, offline=2]
>>
>> *And*
>>
>> [03:56:43,388][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager]
>> Topology snapshot [ver=1036, locNode=4394fdd4, servers=2, clients=2,
>> state=ACTIVE, CPUs=15, offheap=20.0GB, heap=19.0GB]
>> [03:56:43,389][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager]
>>   ^-- Baseline [id=0, size=3, online=2, offline=1]
>>
>> So, it's just 2 different clusters.
>>
>> Best Regards,
>> Evgenii
>>
>> пт, 8 мая 2020 г. в 08:50, John Smith :
>>
>>> Hi Evgenii, here the logs.
>>>
>>> https://www.dropbox.com/s/ke71qsoqg588kc8/ignite-logs.zip?dl=0
>>>
>>> On Fri, 8 May 2020 at 09:21, John Smith  wrote:
>>>
 Ok let me try get them...

 On Thu., May 7, 2020, 1:14 p.m. Evgenii Zhuravlev, <
 e.zhuravlev...@gmail.com> wrote:

> Hi,
>
> It looks like the third server node was not a part of this cluster
> before restart. Can you share full logs from all server nodes?
>
> Evgenii
>
> чт, 7 мая 2020 г. в 09:11, John Smith :
>
>> Hi, running 2.7.0 on 3 deployed on VMs running Ubuntu.
>>
>> I checked the state of the cluster by going
>> to: /ignite?cmd=currentState
>> And the response was: 
>> {"successStatus":0,"error":null,"sessionToken":null,"response":true}
>> I also checked: /ignite?cmd=size&cacheName=
>>
>> 2 nodes where reporting 3 million records
>> 1 node was reporting 2 million records.
>>
>> When I connected to visor and ran the node command... The details
>> where wrong as it only showed 2 server nodes and only 1 client, but 3
>> server nodes actually exist and more clients are connected.
>>
>> So I rebooted the node that was claiming 2 million records instead of
>> 3 and when I re-ran the node command displayed all the proper nodes.
>> Also after the reboot all the nodes started reporting 2 million
>> records instead of 3 million so there some sort of rebalancing or
>> correction (the cache has a 90 day TTL)?
>>
>>
>>
>> Before reboot
>>
>> +=+
>> | # |   Node ID8(@), IP   |Consistent ID
>> | Node Type | Up Time  | CPUs | CPU Load | Free Heap |
>>
>> +=+
>> | 0 | xx(@n0), xx.69 | xx | Server| 20:25:30 | 4|
>> 1.27 %   | 84.00 %   |
>> | 1 | xx(@n1), xx.1 | xx | Client| 13:12:01 | 3|
>> 0.67 %   | 74.00 %   |
>> | 2 | xx(@n2), xx.63 | xx | Server| 16:55:05 | 4|
>> 6.57 %   | 84.00 %   |
>>
>> +

Re: Cache was inconsistent state

2020-05-08 Thread John Smith
How though? It's the same cluster! We haven't changed anything
this happened on it's own...

All I did was reboot the node and the cluster fixed itself.

On Fri, 8 May 2020 at 15:32, Evgenii Zhuravlev 
wrote:

> Hi John,
>
> *Yes, it looks like they are in a different clusters:*
> *Metrics from the node with a problem:*
> [15:17:28,668][INFO][grid-timeout-worker-#23%xx%][IgniteKernal%xx]
> Metrics for local node (to disable set 'metricsLogFrequency' to 0)
> ^-- Node [id=5bbf262e, name=xx, uptime=93 days, 19:36:10.921]
> ^-- H/N/C [hosts=3, nodes=4, CPUs=10]
>
> *Metrics from another node:*
> [15:17:05,635][INFO][grid-timeout-worker-#23%xx%][IgniteKernal%xx]
> Metrics for local node (to disable set 'metricsLogFrequency' to 0)
> ^-- Node [id=dddefdcd, name=xx, uptime=19 days, 16:49:48.381]
> ^-- H/N/C [hosts=6, nodes=7, CPUs=21]
>
> *The same topology versions for 2 nodes has different nodes:*
> [03:56:17,643][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager]
> Topology snapshot [ver=1036, locNode=5bbf262e, servers=1, clients=3,
> state=ACTIVE, CPUs=10, offheap=10.0GB, heap=13.0GB]
> [03:56:17,643][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager]
>   ^-- Baseline [id=0, size=3, online=1, offline=2]
>
> *And*
>
> [03:56:43,388][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager]
> Topology snapshot [ver=1036, locNode=4394fdd4, servers=2, clients=2,
> state=ACTIVE, CPUs=15, offheap=20.0GB, heap=19.0GB]
> [03:56:43,389][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager]
>   ^-- Baseline [id=0, size=3, online=2, offline=1]
>
> So, it's just 2 different clusters.
>
> Best Regards,
> Evgenii
>
> пт, 8 мая 2020 г. в 08:50, John Smith :
>
>> Hi Evgenii, here the logs.
>>
>> https://www.dropbox.com/s/ke71qsoqg588kc8/ignite-logs.zip?dl=0
>>
>> On Fri, 8 May 2020 at 09:21, John Smith  wrote:
>>
>>> Ok let me try get them...
>>>
>>> On Thu., May 7, 2020, 1:14 p.m. Evgenii Zhuravlev, <
>>> e.zhuravlev...@gmail.com> wrote:
>>>
 Hi,

 It looks like the third server node was not a part of this cluster
 before restart. Can you share full logs from all server nodes?

 Evgenii

 чт, 7 мая 2020 г. в 09:11, John Smith :

> Hi, running 2.7.0 on 3 deployed on VMs running Ubuntu.
>
> I checked the state of the cluster by going
> to: /ignite?cmd=currentState
> And the response was: 
> {"successStatus":0,"error":null,"sessionToken":null,"response":true}
> I also checked: /ignite?cmd=size&cacheName=
>
> 2 nodes where reporting 3 million records
> 1 node was reporting 2 million records.
>
> When I connected to visor and ran the node command... The details
> where wrong as it only showed 2 server nodes and only 1 client, but 3
> server nodes actually exist and more clients are connected.
>
> So I rebooted the node that was claiming 2 million records instead of
> 3 and when I re-ran the node command displayed all the proper nodes.
> Also after the reboot all the nodes started reporting 2 million
> records instead of 3 million so there some sort of rebalancing or
> correction (the cache has a 90 day TTL)?
>
>
>
> Before reboot
>
> +=+
> | # |   Node ID8(@), IP   |Consistent ID
>   | Node Type | Up Time  | CPUs | CPU Load | Free Heap |
>
> +=+
> | 0 | xx(@n0), xx.69 | xx | Server| 20:25:30 | 4|
> 1.27 %   | 84.00 %   |
> | 1 | xx(@n1), xx.1 | xx | Client| 13:12:01 | 3|
> 0.67 %   | 74.00 %   |
> | 2 | xx(@n2), xx.63 | xx | Server| 16:55:05 | 4|
> 6.57 %   | 84.00 %   |
>
> +-+
>
> After reboot
>
> +=+
> | # |   Node ID8(@), IP   |Consistent ID
>   | Node Type | Up Time  | CPUs | CPU Load | Free Heap |
>
> +=+
> | 0 | xx(@n0), xx.69 | xx | Server| 21:13:45 | 4|
> 0.77 %   | 56.00 %   |
> | 1 | xx(@n1), xx.1 | xx | Client| 14:00:17 | 3|
> 0.77 %   | 56.00 %   |
> | 2 | xx(@n2), xx.63 | xx | Server| 17:43:20 | 4|
> 1.00 %   | 60.00 %   |
> | 3 | xx(@n3), xx.65 | xx | Client| 01:42:45 | 4|
> 4.10 %   | 56.00 %   |
> | 4 | x

Re: Cache was inconsistent state

2020-05-08 Thread Evgenii Zhuravlev
Hi John,

*Yes, it looks like they are in a different clusters:*
*Metrics from the node with a problem:*
[15:17:28,668][INFO][grid-timeout-worker-#23%xx%][IgniteKernal%xx]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=5bbf262e, name=xx, uptime=93 days, 19:36:10.921]
^-- H/N/C [hosts=3, nodes=4, CPUs=10]

*Metrics from another node:*
[15:17:05,635][INFO][grid-timeout-worker-#23%xx%][IgniteKernal%xx]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=dddefdcd, name=xx, uptime=19 days, 16:49:48.381]
^-- H/N/C [hosts=6, nodes=7, CPUs=21]

*The same topology versions for 2 nodes has different nodes:*
[03:56:17,643][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager]
Topology snapshot [ver=1036, locNode=5bbf262e, servers=1, clients=3,
state=ACTIVE, CPUs=10, offheap=10.0GB, heap=13.0GB]
[03:56:17,643][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager]
  ^-- Baseline [id=0, size=3, online=1, offline=2]

*And*

[03:56:43,388][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager]
Topology snapshot [ver=1036, locNode=4394fdd4, servers=2, clients=2,
state=ACTIVE, CPUs=15, offheap=20.0GB, heap=19.0GB]
[03:56:43,389][INFO][disco-event-worker-#42%xx%][GridDiscoveryManager]
  ^-- Baseline [id=0, size=3, online=2, offline=1]

So, it's just 2 different clusters.

Best Regards,
Evgenii

пт, 8 мая 2020 г. в 08:50, John Smith :

> Hi Evgenii, here the logs.
>
> https://www.dropbox.com/s/ke71qsoqg588kc8/ignite-logs.zip?dl=0
>
> On Fri, 8 May 2020 at 09:21, John Smith  wrote:
>
>> Ok let me try get them...
>>
>> On Thu., May 7, 2020, 1:14 p.m. Evgenii Zhuravlev, <
>> e.zhuravlev...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> It looks like the third server node was not a part of this cluster
>>> before restart. Can you share full logs from all server nodes?
>>>
>>> Evgenii
>>>
>>> чт, 7 мая 2020 г. в 09:11, John Smith :
>>>
 Hi, running 2.7.0 on 3 deployed on VMs running Ubuntu.

 I checked the state of the cluster by going to: /ignite?cmd=currentState
 And the response was: 
 {"successStatus":0,"error":null,"sessionToken":null,"response":true}
 I also checked: /ignite?cmd=size&cacheName=

 2 nodes where reporting 3 million records
 1 node was reporting 2 million records.

 When I connected to visor and ran the node command... The details where
 wrong as it only showed 2 server nodes and only 1 client, but 3 server
 nodes actually exist and more clients are connected.

 So I rebooted the node that was claiming 2 million records instead of 3
 and when I re-ran the node command displayed all the proper nodes.
 Also after the reboot all the nodes started reporting 2 million records
 instead of 3 million so there some sort of rebalancing or correction (the
 cache has a 90 day TTL)?



 Before reboot

 +=+
 | # |   Node ID8(@), IP   |Consistent ID
   | Node Type | Up Time  | CPUs | CPU Load | Free Heap |

 +=+
 | 0 | xx(@n0), xx.69 | xx | Server| 20:25:30 | 4|
 1.27 %   | 84.00 %   |
 | 1 | xx(@n1), xx.1 | xx | Client| 13:12:01 | 3|
 0.67 %   | 74.00 %   |
 | 2 | xx(@n2), xx.63 | xx | Server| 16:55:05 | 4|
 6.57 %   | 84.00 %   |

 +-+

 After reboot

 +=+
 | # |   Node ID8(@), IP   |Consistent ID
   | Node Type | Up Time  | CPUs | CPU Load | Free Heap |

 +=+
 | 0 | xx(@n0), xx.69 | xx | Server| 21:13:45 | 4|
 0.77 %   | 56.00 %   |
 | 1 | xx(@n1), xx.1 | xx | Client| 14:00:17 | 3|
 0.77 %   | 56.00 %   |
 | 2 | xx(@n2), xx.63 | xx | Server| 17:43:20 | 4|
 1.00 %   | 60.00 %   |
 | 3 | xx(@n3), xx.65 | xx | Client| 01:42:45 | 4|
 4.10 %   | 56.00 %   |
 | 4 | xx(@n4), xx.65 | xx | Client| 01:42:45 | 4|
 3.93 %   | 56.00 %   |
 | 5 | xx(@n5), xx.1 | xx | Client| 16:59:53 | 2|
 0.67 %   | 91.00 %   |
 | 6 | xx(@n6), xx.79 | xx | Server| 00:41:31 | 4|
 1.00 %   | 97.00 %   |

 +

Re: Cache was inconsistent state

2020-05-08 Thread John Smith
Hi Evgenii, here the logs.

https://www.dropbox.com/s/ke71qsoqg588kc8/ignite-logs.zip?dl=0

On Fri, 8 May 2020 at 09:21, John Smith  wrote:

> Ok let me try get them...
>
> On Thu., May 7, 2020, 1:14 p.m. Evgenii Zhuravlev, <
> e.zhuravlev...@gmail.com> wrote:
>
>> Hi,
>>
>> It looks like the third server node was not a part of this cluster before
>> restart. Can you share full logs from all server nodes?
>>
>> Evgenii
>>
>> чт, 7 мая 2020 г. в 09:11, John Smith :
>>
>>> Hi, running 2.7.0 on 3 deployed on VMs running Ubuntu.
>>>
>>> I checked the state of the cluster by going to: /ignite?cmd=currentState
>>> And the response was: 
>>> {"successStatus":0,"error":null,"sessionToken":null,"response":true}
>>> I also checked: /ignite?cmd=size&cacheName=
>>>
>>> 2 nodes where reporting 3 million records
>>> 1 node was reporting 2 million records.
>>>
>>> When I connected to visor and ran the node command... The details where
>>> wrong as it only showed 2 server nodes and only 1 client, but 3 server
>>> nodes actually exist and more clients are connected.
>>>
>>> So I rebooted the node that was claiming 2 million records instead of 3
>>> and when I re-ran the node command displayed all the proper nodes.
>>> Also after the reboot all the nodes started reporting 2 million records
>>> instead of 3 million so there some sort of rebalancing or correction (the
>>> cache has a 90 day TTL)?
>>>
>>>
>>>
>>> Before reboot
>>>
>>> +=+
>>> | # |   Node ID8(@), IP   |Consistent ID
>>> | Node Type | Up Time  | CPUs | CPU Load | Free Heap |
>>>
>>> +=+
>>> | 0 | xx(@n0), xx.69 | xx | Server| 20:25:30 | 4|
>>> 1.27 %   | 84.00 %   |
>>> | 1 | xx(@n1), xx.1 | xx | Client| 13:12:01 | 3|
>>> 0.67 %   | 74.00 %   |
>>> | 2 | xx(@n2), xx.63 | xx | Server| 16:55:05 | 4|
>>> 6.57 %   | 84.00 %   |
>>>
>>> +-+
>>>
>>> After reboot
>>>
>>> +=+
>>> | # |   Node ID8(@), IP   |Consistent ID
>>> | Node Type | Up Time  | CPUs | CPU Load | Free Heap |
>>>
>>> +=+
>>> | 0 | xx(@n0), xx.69 | xx | Server| 21:13:45 | 4|
>>> 0.77 %   | 56.00 %   |
>>> | 1 | xx(@n1), xx.1 | xx | Client| 14:00:17 | 3|
>>> 0.77 %   | 56.00 %   |
>>> | 2 | xx(@n2), xx.63 | xx | Server| 17:43:20 | 4|
>>> 1.00 %   | 60.00 %   |
>>> | 3 | xx(@n3), xx.65 | xx | Client| 01:42:45 | 4|
>>> 4.10 %   | 56.00 %   |
>>> | 4 | xx(@n4), xx.65 | xx | Client| 01:42:45 | 4|
>>> 3.93 %   | 56.00 %   |
>>> | 5 | xx(@n5), xx.1 | xx | Client| 16:59:53 | 2|
>>> 0.67 %   | 91.00 %   |
>>> | 6 | xx(@n6), xx.79 | xx | Server| 00:41:31 | 4|
>>> 1.00 %   | 97.00 %   |
>>>
>>> +-+
>>>
>>


Re: Event for an update that should have been filtered is received in Local Listener of Continuous Query when a 1000 row insert is triggered

2020-05-08 Thread akorensh
It might, I would need to see a reproducer to make a determination.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Event for an update that should have been filtered is received in Local Listener of Continuous Query when a 1000 row insert is triggered

2020-05-08 Thread VeenaMithare
I see this line being printed just before any local listener is invoked :
2020-05-06T16:28:17,909 INFO  o.a.i.s.c.t.TcpCommunicationSpi
[grid-nio-worker-tcp-comm-4-#255%ActivDataPublisher-ACTIVEI2-igniteclient-GREEN%]:
Accepted incoming communication connection [locAddr=/x.x.x.x:yyy,
rmtAddr=/x.x.x.x:yyy]

The remote address is the address of the ignite server node.


The question is, should a client whose remote filter should filter out an
update , get this line at all ?




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Event for an update that should have been filtered is received in Local Listener of Continuous Query when a 1000 row insert is triggered

2020-05-08 Thread VeenaMithare
Hi Alex, 

Thank you for the reply . 
>>  verify the continuous query definitions using the appropriate view:
https://apacheignite.readme.io/docs/continuous_queries

We are on 2.7.6 version. I guess this view is not available for us. Also we
have been running the CQs for couple of months now in our test env. and have
not faced any issues.
 
This issue is more recent and happens sometimes ( cannot figure out what
could have caused it yet. ) . Though the issue has happened couple of times
on our test env. - I am not able to reproduce this on my local machine. Also
`i was able to cause this failure only once on the linux test env ( never so
far on my windows machine even though I have tested many scenarios so far. )
It looks like some exceptional scenario or some race condition has caused
this. 

Please note that we have recently put a EVT_NODE_SEGMENTED HANDLER and we
have a handler to a cluster switch request  where switch to a different
cluster based on updates on a particular record on a particular table- the
handling of both events is to do ignite.close() and Ignition.start()( with
the right cluster config ). 

As mentioned, we have tested the event handler and huge inserts/updates
after segmentation etc. and I am not able to cause this issue to occur.  

regards,
Veena



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Cache was inconsistent state

2020-05-08 Thread John Smith
Ok let me try get them...

On Thu., May 7, 2020, 1:14 p.m. Evgenii Zhuravlev, 
wrote:

> Hi,
>
> It looks like the third server node was not a part of this cluster before
> restart. Can you share full logs from all server nodes?
>
> Evgenii
>
> чт, 7 мая 2020 г. в 09:11, John Smith :
>
>> Hi, running 2.7.0 on 3 deployed on VMs running Ubuntu.
>>
>> I checked the state of the cluster by going to: /ignite?cmd=currentState
>> And the response was: 
>> {"successStatus":0,"error":null,"sessionToken":null,"response":true}
>> I also checked: /ignite?cmd=size&cacheName=
>>
>> 2 nodes where reporting 3 million records
>> 1 node was reporting 2 million records.
>>
>> When I connected to visor and ran the node command... The details where
>> wrong as it only showed 2 server nodes and only 1 client, but 3 server
>> nodes actually exist and more clients are connected.
>>
>> So I rebooted the node that was claiming 2 million records instead of 3
>> and when I re-ran the node command displayed all the proper nodes.
>> Also after the reboot all the nodes started reporting 2 million records
>> instead of 3 million so there some sort of rebalancing or correction (the
>> cache has a 90 day TTL)?
>>
>>
>>
>> Before reboot
>>
>> +=+
>> | # |   Node ID8(@), IP   |Consistent ID
>> | Node Type | Up Time  | CPUs | CPU Load | Free Heap |
>>
>> +=+
>> | 0 | xx(@n0), xx.69 | xx | Server| 20:25:30 | 4|
>> 1.27 %   | 84.00 %   |
>> | 1 | xx(@n1), xx.1 | xx | Client| 13:12:01 | 3| 0.67
>> %   | 74.00 %   |
>> | 2 | xx(@n2), xx.63 | xx | Server| 16:55:05 | 4|
>> 6.57 %   | 84.00 %   |
>>
>> +-+
>>
>> After reboot
>>
>> +=+
>> | # |   Node ID8(@), IP   |Consistent ID
>> | Node Type | Up Time  | CPUs | CPU Load | Free Heap |
>>
>> +=+
>> | 0 | xx(@n0), xx.69 | xx | Server| 21:13:45 | 4|
>> 0.77 %   | 56.00 %   |
>> | 1 | xx(@n1), xx.1 | xx | Client| 14:00:17 | 3| 0.77
>> %   | 56.00 %   |
>> | 2 | xx(@n2), xx.63 | xx | Server| 17:43:20 | 4|
>> 1.00 %   | 60.00 %   |
>> | 3 | xx(@n3), xx.65 | xx | Client| 01:42:45 | 4|
>> 4.10 %   | 56.00 %   |
>> | 4 | xx(@n4), xx.65 | xx | Client| 01:42:45 | 4|
>> 3.93 %   | 56.00 %   |
>> | 5 | xx(@n5), xx.1 | xx | Client| 16:59:53 | 2| 0.67
>> %   | 91.00 %   |
>> | 6 | xx(@n6), xx.79 | xx | Server| 00:41:31 | 4|
>> 1.00 %   | 97.00 %   |
>>
>> +-+
>>
>


Re: Deploying the Ignite Maven Project in LINUX

2020-05-08 Thread nithin91
Thanks for sharing the link. It is really very helpful.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/