[jira] [Assigned] (GEODE-10250) The LockGrantor can grant a lock to a member that has left the distributed system

2022-04-19 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-10250:
---

Assignee: Barrett Oglesby

> The LockGrantor can grant a lock to a member that has left the distributed 
> system
> -
>
> Key: GEODE-10250
> URL: https://issues.apache.org/jira/browse/GEODE-10250
> Project: Geode
>  Issue Type: Bug
>  Components: distributed lock service
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: needsTriage
>
> If a member requests a distributed lock and then leaves the distributed 
> system, the grantor may grant that request and leave itself in a state where 
> the lock has been granted but the member has left.
> Here are the steps:
>  # The lock requesting server requests a lock
>  # The grantor server is delayed in granting that lock
>  # The lock requesting server shutsdown in the meantime
>  # The grantor server finally grants the lock after it has released all locks 
> and pending requests for the lock requesting server
>  # The lock requesting server receives the lock response but drops it since 
> the thread pool has been already shutdown



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (GEODE-10250) The LockGrantor can grant a lock to a member that has left the distributed system

2022-04-19 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-10250:
---

 Summary: The LockGrantor can grant a lock to a member that has 
left the distributed system
 Key: GEODE-10250
 URL: https://issues.apache.org/jira/browse/GEODE-10250
 Project: Geode
  Issue Type: Bug
  Components: distributed lock service
Reporter: Barrett Oglesby


If a member requests a distributed lock and then leaves the distributed system, 
the grantor may grant that request and leave itself in a state where the lock 
has been granted but the member has left.

Here are the steps:
 # The lock requesting server requests a lock
 # The grantor server is delayed in granting that lock
 # The lock requesting server shutsdown in the meantime
 # The grantor server finally grants the lock after it has released all locks 
and pending requests for the lock requesting server
 # The lock requesting server receives the lock response but drops it since the 
thread pool has been already shutdown



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (GEODE-10148) [CI Failure] : JMXMBeanFederationDUnitTest > MBeanFederationAddRemoveServer FAILED

2022-04-15 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-10148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522937#comment-17522937
 ] 

Barrett Oglesby commented on GEODE-10148:
-

I think here is where the problem is:

{{LocalManager.startLocalManagement}} runs the {{ManagementTask}} once right 
when it starts.

With logging added, the call to {{managementTask.get().run()}} returns right 
away. Even though the comment says its a synchronous call, it isn't.
{noformat}
[vm3] [warn 2022/03/23 16:16:02.173 PDT server-3  tid=0x12] XXX LocalManager.startLocalManagement about 
to run managementTask

[vm3] [warn 2022/03/23 16:16:02.173 PDT server-3  tid=0x12] XXX LocalManager.startLocalManagement done 
managementTask
{noformat}
Then, {{LocalManager.markForFederation}} adds the mbeans to the 
{{federatedComponentMap}}:
{noformat}
[vm3] [warn 2022/03/23 16:16:02.209 PDT server-3  tid=0x12] XXX LocalManager.markForFederation about to 
add to federatedComponentMap objName=GemFire:type=Member,member=server-3

[vm3] [warn 2022/03/23 16:16:02.364 PDT server-3  tid=0x12] XXX LocalManager.markForFederation about to 
add to federatedComponentMap 
objName=GemFire:service=Region,name="/test-region-1",type=Member,member=server-3

[vm3] [warn 2022/03/23 16:16:02.437 PDT server-3  tid=0x12] XXX LocalManager.markForFederation about to 
add to federatedComponentMap 
objName=GemFire:service=CacheServer,port=20017,type=Member,member=server-3
{noformat}
The CacheServer mbean above is the one that is missing in the failed run.

Then, the {{Management Task}} thread runs the {{ManagementTask}} started above 
to put the mbeans into the region:
{noformat}
[vm3] [warn 2022/03/23 16:16:04.177 PDT server-3  tid=0x46] 
XXX LocalManager.doManagementTask about to putAll 
replicaMap={GemFire:service=CacheServer,port=20017,type=Member,member=server-3=ObjectName
 = GemFire:service=CacheServer,port=20017,type=Member,member=server-3, 
GemFire:service=Region,name="/test-region-1",type=Member,member=server-3=ObjectName
 = GemFire:service=Region,name="/test-region-1",type=Member,member=server-3, 
GemFire:type=Member,member=server-3=ObjectName = 
GemFire:type=Member,member=server-3}

[vm3] [warn 2022/03/23 16:16:04.211 PDT server-3  tid=0x46] 
XXX LocalManager.doManagementTask done putAll 
replicaMap={GemFire:service=CacheServer,port=20017,type=Member,member=server-3=ObjectName
 = GemFire:service=CacheServer,port=20017,type=Member,member=server-3, 
GemFire:service=Region,name="/test-region-1",type=Member,member=server-3=ObjectName
 = GemFire:service=Region,name="/test-region-1",type=Member,member=server-3, 
GemFire:type=Member,member=server-3=ObjectName = 
GemFire:type=Member,member=server-3}
{noformat}
If the {{Management Task}} thread runs between the added Region and CacheServer 
mbeans, this issue would reproduce.


> [CI Failure] : JMXMBeanFederationDUnitTest > MBeanFederationAddRemoveServer 
> FAILED
> --
>
> Key: GEODE-10148
> URL: https://issues.apache.org/jira/browse/GEODE-10148
> Project: Geode
>  Issue Type: Bug
>  Components: jmx
>Affects Versions: 1.15.0
>Reporter: Nabarun Nag
>Priority: Major
>  Labels: test-stability
>
> JMXMBeanFederationDUnitTest > MBeanFederationAddRemoveServer FAILED
> java.lang.AssertionError: 
> Expecting actual:
>   ["GemFire:service=AccessControl,type=Distributed",
> "GemFire:service=CacheServer,port=20842,type=Member,member=server-1",
> "GemFire:service=CacheServer,port=20846,type=Member,member=server-2",
> 
> "GemFire:service=DiskStore,name=cluster_config,type=Member,member=locator-one",
> "GemFire:service=FileUploader,type=Distributed",
> "GemFire:service=Locator,type=Member,member=locator-one",
> 
> "GemFire:service=LockService,name=__CLUSTER_CONFIG_LS,type=Distributed",
> 
> "GemFire:service=LockService,name=__CLUSTER_CONFIG_LS,type=Member,member=locator-one",
> "GemFire:service=Manager,type=Member,member=locator-one",
> "GemFire:service=Region,name="/test-region-1",type=Distributed",
> 
> "GemFire:service=Region,name="/test-region-1",type=Member,member=server-1",
> 
> "GemFire:service=Region,name="/test-region-1",type=Member,member=server-2",
> 
> "GemFire:service=Region,name="/test-region-1",type=Member,member=server-3",
> "GemFire:service=System,type=Distributed",
> "GemFire:type=Member,member=locator-one",
> "GemFire:type=Member,member=server-1",
> "GemFire:type=Member,member=server-2",
> "GemFire:type=Member,member=server-3"]
> to contain exactly (and in same order):
>   ["GemFire:service=AccessControl,type=Distributed",
> "GemFire:service=CacheServer,port=20842,type=Member,member=server-1",
> 

[jira] [Resolved] (GEODE-10212) In a WAN topology with 3 sites in a star pattern, stopping a sender between two of the sites causes an event to be dropped even though another path exists between the t

2022-04-01 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby resolved GEODE-10212.
-
Resolution: Duplicate

> In a WAN topology with 3 sites in a star pattern, stopping a sender between 
> two of the sites causes an event to be dropped even though another path 
> exists between the two sites
> 
>
> Key: GEODE-10212
> URL: https://issues.apache.org/jira/browse/GEODE-10212
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: needsTriage
>
> A WAN topology in a star pattern means every site is connected to every other 
> site like:
> {noformat}
> site-A <--> site-B <--> siteC
> ^_^
> {noformat}
> If the sender from site-A to site-B is stopped and a put is done in site-A, 
> site-B doesn't receive the event even though site-A is connected to site-C 
> and site-C is connected to site-B.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (GEODE-10212) In a WAN topology with 3 sites in a star pattern, stopping a sender between two of the sites causes an event to be dropped even though another path exists between the t

2022-04-01 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-10212:
---

Assignee: Barrett Oglesby

> In a WAN topology with 3 sites in a star pattern, stopping a sender between 
> two of the sites causes an event to be dropped even though another path 
> exists between the two sites
> 
>
> Key: GEODE-10212
> URL: https://issues.apache.org/jira/browse/GEODE-10212
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: needsTriage
>
> A WAN topology in a star pattern means every site is connected to every other 
> site like:
> {noformat}
> site-A <--> site-B <--> siteC
> ^_^
> {noformat}
> If the sender from site-A to site-B is stopped and a put is done in site-A, 
> site-B doesn't receive the event even though site-A is connected to site-C 
> and site-C is connected to site-B.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-10212) In a WAN topology with 3 sites in a star pattern, stopping a sender between two of the sites causes an event to be dropped even though another path exists between the tw

2022-04-01 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby updated GEODE-10212:

Description: 
A WAN topology in a star pattern means every site is connected to every other 
site like:
{noformat}
site-A <--> site-B <--> siteC
^_^
{noformat}
If the sender from site-A to site-B is stopped and a put is done in site-A, 
site-B doesn't receive the event even though site-A is connected to site-C and 
site-C is connected to site-B.

  was:
A WAN topology in a star pattern means every site is connected to every other 
site like:

site-A <-> site-B <-> siteC
^_^

If the sender from site-A to site-B is stopped and a put is done in site-A, 
site-B doesn't receive the event even though site-A is connected to site-C and 
site-C is connected to site-B.


> In a WAN topology with 3 sites in a star pattern, stopping a sender between 
> two of the sites causes an event to be dropped even though another path 
> exists between the two sites
> 
>
> Key: GEODE-10212
> URL: https://issues.apache.org/jira/browse/GEODE-10212
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Barrett Oglesby
>Priority: Major
>  Labels: needsTriage
>
> A WAN topology in a star pattern means every site is connected to every other 
> site like:
> {noformat}
> site-A <--> site-B <--> siteC
> ^_^
> {noformat}
> If the sender from site-A to site-B is stopped and a put is done in site-A, 
> site-B doesn't receive the event even though site-A is connected to site-C 
> and site-C is connected to site-B.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (GEODE-10212) In a WAN topology with 3 sites in a star pattern, stopping a sender between two of the sites causes an event to be dropped even though another path exists between the tw

2022-04-01 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-10212:
---

 Summary: In a WAN topology with 3 sites in a star pattern, 
stopping a sender between two of the sites causes an event to be dropped even 
though another path exists between the two sites
 Key: GEODE-10212
 URL: https://issues.apache.org/jira/browse/GEODE-10212
 Project: Geode
  Issue Type: Bug
  Components: wan
Reporter: Barrett Oglesby


A WAN topology in a star pattern means every site is connected to every other 
site like:

site-A <-> site-B <-> siteC
^_^

If the sender from site-A to site-B is stopped and a put is done in site-A, 
site-B doesn't receive the event even though site-A is connected to site-C and 
site-C is connected to site-B.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-10164) Revert wording change in rebalance result

2022-03-25 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby updated GEODE-10164:

Affects Version/s: 1.15.0

> Revert wording change in rebalance result
> -
>
> Key: GEODE-10164
> URL: https://issues.apache.org/jira/browse/GEODE-10164
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Affects Versions: 1.15.0
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: blocks-1.15.0, needsTriage, pull-request-available
> Fix For: 1.12.10, 1.13.9, 1.14.5, 1.15.0
>
>
> I made a change to the wording of the rebalance command result
> from:
> {noformat}
> Rebalanced partition regions {noformat}
> to:
> {noformat}
> Rebalanced partitioned region {noformat}
> This change caused hydra and other tests to fail, so I'm reverting it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-10164) Revert wording change in rebalance result

2022-03-25 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby updated GEODE-10164:

Labels: blocks-1.15.0 needsTriage pull-request-available  (was: needsTriage 
pull-request-available)

> Revert wording change in rebalance result
> -
>
> Key: GEODE-10164
> URL: https://issues.apache.org/jira/browse/GEODE-10164
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: blocks-1.15.0, needsTriage, pull-request-available
> Fix For: 1.12.10, 1.13.9, 1.14.5, 1.15.0
>
>
> I made a change to the wording of the rebalance command result
> from:
> {noformat}
> Rebalanced partition regions {noformat}
> to:
> {noformat}
> Rebalanced partitioned region {noformat}
> This change caused hydra and other tests to fail, so I'm reverting it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (GEODE-10164) Revert wording change in rebalance result

2022-03-25 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby resolved GEODE-10164.
-
Fix Version/s: 1.12.10
   1.13.9
   1.14.5
   1.15.0
   Resolution: Fixed

> Revert wording change in rebalance result
> -
>
> Key: GEODE-10164
> URL: https://issues.apache.org/jira/browse/GEODE-10164
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: needsTriage, pull-request-available
> Fix For: 1.12.10, 1.13.9, 1.14.5, 1.15.0
>
>
> I made a change to the wording of the rebalance command result
> from:
> {noformat}
> Rebalanced partition regions {noformat}
> to:
> {noformat}
> Rebalanced partitioned region {noformat}
> This change caused hydra and other tests to fail, so I'm reverting it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (GEODE-10164) Revert wording change in rebalance result

2022-03-24 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-10164:
---

Assignee: Barrett Oglesby

> Revert wording change in rebalance result
> -
>
> Key: GEODE-10164
> URL: https://issues.apache.org/jira/browse/GEODE-10164
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: needsTriage
>
> I made a change to the wording of the rebalance command result
> from:
> {noformat}
> Rebalanced partition regions {noformat}
> to:
> {noformat}
> Rebalanced partitioned region {noformat}
> This change caused hydra and other tests to fail, so I'm reverting it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (GEODE-10164) Revert wording change in rebalance result

2022-03-24 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-10164:
---

 Summary: Revert wording change in rebalance result
 Key: GEODE-10164
 URL: https://issues.apache.org/jira/browse/GEODE-10164
 Project: Geode
  Issue Type: Bug
  Components: gfsh
Reporter: Barrett Oglesby


I made a change to the wording of the rebalance command result

from:
{noformat}
Rebalanced partition regions {noformat}
to:
{noformat}
Rebalanced partitioned region {noformat}
This change caused hydra and other tests to fail, so I'm reverting it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-10148) [CI Failure] : JMXMBeanFederationDUnitTest > MBeanFederationAddRemoveServer FAILED

2022-03-23 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-10148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511520#comment-17511520
 ] 

Barrett Oglesby commented on GEODE-10148:
-

The test is saying that the result of this call to the locator is missing the 
CacheServer MBean that exists in the expectedMBeans list.

List intermediateMBeans = getFederatedGemfireBeansFrom(locator1);

That mbean list in the locator is updated asynchronously by the ManagementTask 
in each member.

See ManagementResourceRepo.putAllInLocalMonitoringRegion. The 
localMonitoringRegion is DISTRIBUTED_NO_ACK.



> [CI Failure] : JMXMBeanFederationDUnitTest > MBeanFederationAddRemoveServer 
> FAILED
> --
>
> Key: GEODE-10148
> URL: https://issues.apache.org/jira/browse/GEODE-10148
> Project: Geode
>  Issue Type: Bug
>  Components: jmx
>Affects Versions: 1.15.0
>Reporter: Nabarun Nag
>Assignee: Owen Nichols
>Priority: Major
>  Labels: needsTriage
>
> JMXMBeanFederationDUnitTest > MBeanFederationAddRemoveServer FAILED
> java.lang.AssertionError: 
> Expecting actual:
>   ["GemFire:service=AccessControl,type=Distributed",
> "GemFire:service=CacheServer,port=20842,type=Member,member=server-1",
> "GemFire:service=CacheServer,port=20846,type=Member,member=server-2",
> 
> "GemFire:service=DiskStore,name=cluster_config,type=Member,member=locator-one",
> "GemFire:service=FileUploader,type=Distributed",
> "GemFire:service=Locator,type=Member,member=locator-one",
> 
> "GemFire:service=LockService,name=__CLUSTER_CONFIG_LS,type=Distributed",
> 
> "GemFire:service=LockService,name=__CLUSTER_CONFIG_LS,type=Member,member=locator-one",
> "GemFire:service=Manager,type=Member,member=locator-one",
> "GemFire:service=Region,name="/test-region-1",type=Distributed",
> 
> "GemFire:service=Region,name="/test-region-1",type=Member,member=server-1",
> 
> "GemFire:service=Region,name="/test-region-1",type=Member,member=server-2",
> 
> "GemFire:service=Region,name="/test-region-1",type=Member,member=server-3",
> "GemFire:service=System,type=Distributed",
> "GemFire:type=Member,member=locator-one",
> "GemFire:type=Member,member=server-1",
> "GemFire:type=Member,member=server-2",
> "GemFire:type=Member,member=server-3"]
> to contain exactly (and in same order):
>   ["GemFire:service=AccessControl,type=Distributed",
> "GemFire:service=CacheServer,port=20842,type=Member,member=server-1",
> "GemFire:service=CacheServer,port=20846,type=Member,member=server-2",
> "GemFire:service=CacheServer,port=20850,type=Member,member=server-3",
> 
> "GemFire:service=DiskStore,name=cluster_config,type=Member,member=locator-one",
> "GemFire:service=FileUploader,type=Distributed",
> "GemFire:service=Locator,type=Member,member=locator-one",
> 
> "GemFire:service=LockService,name=__CLUSTER_CONFIG_LS,type=Distributed",
> 
> "GemFire:service=LockService,name=__CLUSTER_CONFIG_LS,type=Member,member=locator-one",
> "GemFire:service=Manager,type=Member,member=locator-one",
> "GemFire:service=Region,name="/test-region-1",type=Distributed",
> 
> "GemFire:service=Region,name="/test-region-1",type=Member,member=server-1",
> 
> "GemFire:service=Region,name="/test-region-1",type=Member,member=server-2",
> 
> "GemFire:service=Region,name="/test-region-1",type=Member,member=server-3",
> "GemFire:service=System,type=Distributed",
> "GemFire:type=Member,member=locator-one",
> "GemFire:type=Member,member=server-1",
> "GemFire:type=Member,member=server-2",
> "GemFire:type=Member,member=server-3"]
> but could not find the following elements:
>   ["GemFire:service=CacheServer,port=20850,type=Member,member=server-3"]
> at 
> org.apache.geode.management.internal.JMXMBeanFederationDUnitTest.MBeanFederationAddRemoveServer(JMXMBeanFederationDUnitTest.java:130)
> 8352 tests completed, 1 failed, 414 skipped



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (GEODE-10148) [CI Failure] : JMXMBeanFederationDUnitTest > MBeanFederationAddRemoveServer FAILED

2022-03-23 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-10148:
---

Assignee: (was: Barrett Oglesby)

> [CI Failure] : JMXMBeanFederationDUnitTest > MBeanFederationAddRemoveServer 
> FAILED
> --
>
> Key: GEODE-10148
> URL: https://issues.apache.org/jira/browse/GEODE-10148
> Project: Geode
>  Issue Type: Bug
>  Components: jmx
>Affects Versions: 1.15.0
>Reporter: Nabarun Nag
>Priority: Major
>  Labels: needsTriage
>
> JMXMBeanFederationDUnitTest > MBeanFederationAddRemoveServer FAILED
> java.lang.AssertionError: 
> Expecting actual:
>   ["GemFire:service=AccessControl,type=Distributed",
> "GemFire:service=CacheServer,port=20842,type=Member,member=server-1",
> "GemFire:service=CacheServer,port=20846,type=Member,member=server-2",
> 
> "GemFire:service=DiskStore,name=cluster_config,type=Member,member=locator-one",
> "GemFire:service=FileUploader,type=Distributed",
> "GemFire:service=Locator,type=Member,member=locator-one",
> 
> "GemFire:service=LockService,name=__CLUSTER_CONFIG_LS,type=Distributed",
> 
> "GemFire:service=LockService,name=__CLUSTER_CONFIG_LS,type=Member,member=locator-one",
> "GemFire:service=Manager,type=Member,member=locator-one",
> "GemFire:service=Region,name="/test-region-1",type=Distributed",
> 
> "GemFire:service=Region,name="/test-region-1",type=Member,member=server-1",
> 
> "GemFire:service=Region,name="/test-region-1",type=Member,member=server-2",
> 
> "GemFire:service=Region,name="/test-region-1",type=Member,member=server-3",
> "GemFire:service=System,type=Distributed",
> "GemFire:type=Member,member=locator-one",
> "GemFire:type=Member,member=server-1",
> "GemFire:type=Member,member=server-2",
> "GemFire:type=Member,member=server-3"]
> to contain exactly (and in same order):
>   ["GemFire:service=AccessControl,type=Distributed",
> "GemFire:service=CacheServer,port=20842,type=Member,member=server-1",
> "GemFire:service=CacheServer,port=20846,type=Member,member=server-2",
> "GemFire:service=CacheServer,port=20850,type=Member,member=server-3",
> 
> "GemFire:service=DiskStore,name=cluster_config,type=Member,member=locator-one",
> "GemFire:service=FileUploader,type=Distributed",
> "GemFire:service=Locator,type=Member,member=locator-one",
> 
> "GemFire:service=LockService,name=__CLUSTER_CONFIG_LS,type=Distributed",
> 
> "GemFire:service=LockService,name=__CLUSTER_CONFIG_LS,type=Member,member=locator-one",
> "GemFire:service=Manager,type=Member,member=locator-one",
> "GemFire:service=Region,name="/test-region-1",type=Distributed",
> 
> "GemFire:service=Region,name="/test-region-1",type=Member,member=server-1",
> 
> "GemFire:service=Region,name="/test-region-1",type=Member,member=server-2",
> 
> "GemFire:service=Region,name="/test-region-1",type=Member,member=server-3",
> "GemFire:service=System,type=Distributed",
> "GemFire:type=Member,member=locator-one",
> "GemFire:type=Member,member=server-1",
> "GemFire:type=Member,member=server-2",
> "GemFire:type=Member,member=server-3"]
> but could not find the following elements:
>   ["GemFire:service=CacheServer,port=20850,type=Member,member=server-3"]
> at 
> org.apache.geode.management.internal.JMXMBeanFederationDUnitTest.MBeanFederationAddRemoveServer(JMXMBeanFederationDUnitTest.java:130)
> 8352 tests completed, 1 failed, 414 skipped



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (GEODE-10148) [CI Failure] : JMXMBeanFederationDUnitTest > MBeanFederationAddRemoveServer FAILED

2022-03-23 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-10148:
---

Assignee: Barrett Oglesby

> [CI Failure] : JMXMBeanFederationDUnitTest > MBeanFederationAddRemoveServer 
> FAILED
> --
>
> Key: GEODE-10148
> URL: https://issues.apache.org/jira/browse/GEODE-10148
> Project: Geode
>  Issue Type: Bug
>  Components: jmx
>Affects Versions: 1.15.0
>Reporter: Nabarun Nag
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: needsTriage
>
> JMXMBeanFederationDUnitTest > MBeanFederationAddRemoveServer FAILED
> java.lang.AssertionError: 
> Expecting actual:
>   ["GemFire:service=AccessControl,type=Distributed",
> "GemFire:service=CacheServer,port=20842,type=Member,member=server-1",
> "GemFire:service=CacheServer,port=20846,type=Member,member=server-2",
> 
> "GemFire:service=DiskStore,name=cluster_config,type=Member,member=locator-one",
> "GemFire:service=FileUploader,type=Distributed",
> "GemFire:service=Locator,type=Member,member=locator-one",
> 
> "GemFire:service=LockService,name=__CLUSTER_CONFIG_LS,type=Distributed",
> 
> "GemFire:service=LockService,name=__CLUSTER_CONFIG_LS,type=Member,member=locator-one",
> "GemFire:service=Manager,type=Member,member=locator-one",
> "GemFire:service=Region,name="/test-region-1",type=Distributed",
> 
> "GemFire:service=Region,name="/test-region-1",type=Member,member=server-1",
> 
> "GemFire:service=Region,name="/test-region-1",type=Member,member=server-2",
> 
> "GemFire:service=Region,name="/test-region-1",type=Member,member=server-3",
> "GemFire:service=System,type=Distributed",
> "GemFire:type=Member,member=locator-one",
> "GemFire:type=Member,member=server-1",
> "GemFire:type=Member,member=server-2",
> "GemFire:type=Member,member=server-3"]
> to contain exactly (and in same order):
>   ["GemFire:service=AccessControl,type=Distributed",
> "GemFire:service=CacheServer,port=20842,type=Member,member=server-1",
> "GemFire:service=CacheServer,port=20846,type=Member,member=server-2",
> "GemFire:service=CacheServer,port=20850,type=Member,member=server-3",
> 
> "GemFire:service=DiskStore,name=cluster_config,type=Member,member=locator-one",
> "GemFire:service=FileUploader,type=Distributed",
> "GemFire:service=Locator,type=Member,member=locator-one",
> 
> "GemFire:service=LockService,name=__CLUSTER_CONFIG_LS,type=Distributed",
> 
> "GemFire:service=LockService,name=__CLUSTER_CONFIG_LS,type=Member,member=locator-one",
> "GemFire:service=Manager,type=Member,member=locator-one",
> "GemFire:service=Region,name="/test-region-1",type=Distributed",
> 
> "GemFire:service=Region,name="/test-region-1",type=Member,member=server-1",
> 
> "GemFire:service=Region,name="/test-region-1",type=Member,member=server-2",
> 
> "GemFire:service=Region,name="/test-region-1",type=Member,member=server-3",
> "GemFire:service=System,type=Distributed",
> "GemFire:type=Member,member=locator-one",
> "GemFire:type=Member,member=server-1",
> "GemFire:type=Member,member=server-2",
> "GemFire:type=Member,member=server-3"]
> but could not find the following elements:
>   ["GemFire:service=CacheServer,port=20850,type=Member,member=server-3"]
> at 
> org.apache.geode.management.internal.JMXMBeanFederationDUnitTest.MBeanFederationAddRemoveServer(JMXMBeanFederationDUnitTest.java:130)
> 8352 tests completed, 1 failed, 414 skipped



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-10144) Regression in geode-native test CqPlusAuthInitializeTest.reAuthenticateWithDurable

2022-03-23 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-10144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511339#comment-17511339
 ] 

Barrett Oglesby commented on GEODE-10144:
-

This issue comes down to a few factors:
 - The default PoolFactory::DEFAULT_SUBSCRIPTION_ACK_INTERVAL of 100 seconds 
means all the events stay on the queue for entire test. This means that every 
time the client disconnects, the server starts over again processing the queue 
from the beginning.
 - The NC client is version GFE 9.0 (earlier) so no ClientReAuthenticateMessage 
is sent to it when a AuthenticationExpiredException occurs. The server waits 
anyway in case new credentials are sent through another operation.
 - With the new changes, the server waits 5 seconds to be notified of a 
re-auth. If no re-auth occurs, it waits the entire 5 seconds.
 - With the old code, the server waits 200 ms before attempting to process the 
event again (which includes asking for authorization again). The 
SimulatedExpirationSecurityManager randomly decides whether to authorize the 
event. 99% of the time, it returns true. So, the second request will almost 
always return true.
 - So without any external event (like new credentials):
 -- With the the old code, the Message Dispatcher processes the event 
successfully after 200 ms with no client disconnect
 -- With the new code, the Message Dispatcher waits 5 seconds and then 
disconnects the client

> Regression in geode-native test 
> CqPlusAuthInitializeTest.reAuthenticateWithDurable
> --
>
> Key: GEODE-10144
> URL: https://issues.apache.org/jira/browse/GEODE-10144
> Project: Geode
>  Issue Type: Bug
>  Components: client/server
>Affects Versions: 1.15.0
>Reporter: Blake Bender
>Assignee: Jinmei Liao
>Priority: Major
>  Labels: blocks-1.15.0, needsTriage
> Fix For: 1.15.0
>
>
> This test is failing across the board in the `geode-native` PR pipeline.  
> Main develop pipeline is green only because nothing can get through the PR 
> pipeline to clear checkin gates.  We have green CI runs with 1.15. build 918, 
> then it started failing when we picked up build 924.  
>  
> [~moleske] tracked this back to this commit:  
> [https://github.com/apache/geode/commit/2554f42b925f2b9b8ca7eee14c7a887436b1d9db|https://github.com/apache/geode/commit/2554f42b925f2b9b8ca7eee14c7a887436b1d9db].
>   See his notes in `geode-native` PR # 947 
> ([https://github.com/apache/geode-native/pull/947])



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-10144) Regression in geode-native test CqPlusAuthInitializeTest.reAuthenticateWithDurable

2022-03-22 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-10144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17510954#comment-17510954
 ] 

Barrett Oglesby commented on GEODE-10144:
-

Here is why this test passes with the previous server code.

In the previous server code, after the Client Message Dispatcher caught an 
AuthenticationExpiredException, it slept for 200 ms before trying again. It 
does this for up to 5 seconds before giving up. Each time it retries, it asks 
for authorization again.

Here is a case where SimulatedExpirationSecurityManager.authorize throws an 
AuthenticationExpiredException:
{noformat}
[warn 2022/03/22 14:59:04.110 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher about to dispatchMessage 
operation=AFTER_CREATE; key=key130

[warn 2022/03/22 14:59:04.110 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
SimulatedExpirationSecurityManager.authorize about to throw 
AuthenticationExpiredException

[warn 2022/03/22 14:59:04.110 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher caught AuthenticationExpiredException

[warn 2022/03/22 14:59:04.110 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher skipped sending ClientReAuthenticateMessage 
clientVersion=GFE 9.0
{noformat}
The Client Message Dispatcher sleeps for 200 ms:
{noformat}
[warn 2022/03/22 14:59:04.110 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher about to sleep 1 for 200 ms

[warn 2022/03/22 14:59:04.311 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher done sleep 1
{noformat}
When it wakes up, it checks for authorization again. This time, the 
SimulatedExpirationSecurityManager returns true, so the message is sent:
{noformat}
[warn 2022/03/22 14:59:04.311 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher about to dispatchMessage 
operation=AFTER_CREATE; key=key130

[warn 2022/03/22 14:59:04.311 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
SimulatedExpirationSecurityManager.authorize about to return true

[warn 2022/03/22 14:59:04.311 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher done dispatchMessage operation=AFTER_CREATE; 
key=key130
{noformat}
This path is not relying on outside operations to notify the Client Message 
Dispatcher. The SimulatedExpirationSecurityManager authorizes the operation 
after the sleep.

So at the end of the run where there are no client operations, the Client 
Message Dispatcher is most likely only going to sleep 200 ms. There is never 
going to be a 5 second wait.

I did see a few times where the Client Message Dispatcher slept twice through 
the loop (so 400 ms):
{noformat}
[warn 2022/03/22 14:59:23.924 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher about to dispatchMessage 
operation=AFTER_UPDATE; key=key4820

[warn 2022/03/22 14:59:23.924 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
SimulatedExpirationSecurityManager.authorize about to throw 
AuthenticationExpiredException

[warn 2022/03/22 14:59:23.924 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher caught AuthenticationExpiredException

[warn 2022/03/22 14:59:23.924 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher skipped sending ClientReAuthenticateMessage 
clientVersion=GFE 9.0

[warn 2022/03/22 14:59:23.924 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher about to sleep 1 for 200 ms

[warn 2022/03/22 14:59:24.124 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher done sleep 1

[warn 2022/03/22 14:59:24.124 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher about to dispatchMessage 
operation=AFTER_UPDATE; key=key4820

[warn 2022/03/22 14:59:24.124 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
SimulatedExpirationSecurityManager.authorize about to throw 
AuthenticationExpiredException

[warn 2022/03/22 14:59:24.124 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher caught AuthenticationExpiredException

[warn 2022/03/22 14:59:24.124 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher about to sleep 2 for 

[jira] [Commented] (GEODE-10144) Regression in geode-native test CqPlusAuthInitializeTest.reAuthenticateWithDurable

2022-03-22 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-10144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17510953#comment-17510953
 ] 

Barrett Oglesby commented on GEODE-10144:
-

Even though this JIRA is resolved, I did some analysis on it.

I see whats going on in this test.

At the beginning of the test client cache operations are occurring 
simultaneously with message dispatching from the server to the client.

Here is a ServerConnection processing puts:
{noformat}
[warn 2022/03/22 15:40:01.096 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x50] XXX Put70.cmdExecute operation=UPDATE; 
key=key50

[warn 2022/03/22 15:40:01.096 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x50] XXX 
SimulatedExpirationSecurityManager.authorize about to return true

[warn 2022/03/22 15:40:01.099 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x50] XXX Put70.cmdExecute operation=UPDATE; 
key=key51

[warn 2022/03/22 15:40:01.099 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x50] XXX 
SimulatedExpirationSecurityManager.authorize about to return true

[warn 2022/03/22 15:40:01.101 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x50] XXX Put70.cmdExecute operation=UPDATE; 
key=key52

[warn 2022/03/22 15:40:01.102 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x50] XXX 
SimulatedExpirationSecurityManager.authorize about to return true

[warn 2022/03/22 15:40:01.104 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x50] XXX Put70.cmdExecute operation=UPDATE; 
key=key53

[warn 2022/03/22 15:40:01.104 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x50] XXX 
SimulatedExpirationSecurityManager.authorize about to return true

[warn 2022/03/22 15:40:01.106 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x50] XXX Put70.cmdExecute operation=UPDATE; 
key=key54

[warn 2022/03/22 15:40:01.107 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x50] XXX 
SimulatedExpirationSecurityManager.authorize about to return true
{noformat}
At the same time, the Client Message Dispatcher is dispatching events to the 
client:
{noformat}
[warn 2022/03/22 15:40:01.098 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher about to dispatchMessage 
operation=AFTER_CREATE; key=key50

[warn 2022/03/22 15:40:01.098 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
SimulatedExpirationSecurityManager.authorize about to return true

[warn 2022/03/22 15:40:01.099 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher done dispatchMessage operation=AFTER_CREATE; 
key=key50

[warn 2022/03/22 15:40:01.101 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher about to dispatchMessage 
operation=AFTER_CREATE; key=key51

[warn 2022/03/22 15:40:01.101 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
SimulatedExpirationSecurityManager.authorize about to return true

[warn 2022/03/22 15:40:01.101 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher done dispatchMessage operation=AFTER_CREATE; 
key=key51

[warn 2022/03/22 15:40:01.103 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher about to dispatchMessage 
operation=AFTER_CREATE; key=key52

[warn 2022/03/22 15:40:01.104 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
SimulatedExpirationSecurityManager.authorize about to return true

[warn 2022/03/22 15:40:01.104 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher done dispatchMessage operation=AFTER_CREATE; 
key=key52

[warn 2022/03/22 15:40:01.106 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher about to dispatchMessage 
operation=AFTER_CREATE; key=key53

[warn 2022/03/22 15:40:01.106 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
SimulatedExpirationSecurityManager.authorize about to return true

[warn 2022/03/22 15:40:01.106 PDT 
CqPlusAuthInitializeTest_reAuthenticateWithDurable_server_0  tid=0x51] XXX 
MessageDispatcher.runDispatcher done dispatchMessage operation=AFTER_CREATE; 
key=key53
{noformat}
While the Client Message Dispatcher, it requests authorization which fails. 
Since the NC is not the latest version, no ClientReAuthenticateMessage is sent 
to it. The dispatcher waits anyway in case another operation updates the 
credentials:
{noformat}
[warn 2022/03/22 15:40:01.109 PDT 

[jira] [Resolved] (GEODE-9910) Failure to auto-reconnect upon network partition

2022-03-10 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby resolved GEODE-9910.

Fix Version/s: 1.12.10
   1.13.9
   1.14.5
   1.15.0
   Resolution: Fixed

> Failure to auto-reconnect upon network partition
> 
>
> Key: GEODE-9910
> URL: https://issues.apache.org/jira/browse/GEODE-9910
> Project: Geode
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Surya Mudundi
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: GeodeOperationAPI, blocks-1.15.0​, needsTriage, 
> pull-request-available
> Fix For: 1.12.10, 1.13.9, 1.14.5, 1.15.0
>
> Attachments: geode-logs.zip
>
>
> Two node cluster with embedded locators failed to auto-reconnect when node-1 
> experienced network outage for couple of minutes and when node-1 recovered 
> from the outage, node-2 failed to auto-reconnect.
> node-2 tried to re-connect to node-1 as:
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Attempting to reconnect to the distributed system.  This is attempt #1.
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Attempting to reconnect to the distributed system.  This is attempt #2.
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Attempting to reconnect to the distributed system.  This is attempt #3.
> Finally reported below error after 3 attempts as:
> INFO  
> [org.apache.geode.logging.internal.LoggingProviderLoader]-[ReconnectThread] 
> [] Using org.apache.geode.logging.internal.SimpleLoggingProvider for service 
> org.apache.geode.logging.internal.spi.LoggingProvider
> INFO  [org.apache.geode.internal.InternalDataSerializer]-[ReconnectThread] [] 
> initializing InternalDataSerializer with 0 services
> INFO  
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] performing a quorum check to see if location services can be started early
> INFO  
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Quorum check passed - allowing location services to start early
> WARN  
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Exception occurred while trying to connect the system during reconnect
> java.lang.IllegalStateException: A locator can not be created because one 
> already exists in this JVM.
>         at 
> org.apache.geode.distributed.internal.InternalLocator.createLocator(InternalLocator.java:298)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalLocator.createLocator(InternalLocator.java:273)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.startInitLocator(InternalDistributedSystem.java:916)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:768)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.access$200(InternalDistributedSystem.java:135)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem$Builder.build(InternalDistributedSystem.java:3034)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.connectInternal(InternalDistributedSystem.java:290)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2605)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2424)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1275)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:2326)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.uncleanShutdown(GMSMembership.java:1187)
>  ~[geode-membership-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.lambda$forceDisconnect$0(GMSMembership.java:1811)
>  ~[geode-membership-1.14.0.jar:?]
>         at java.lang.Thread.run(Thread.java:829) [?:?]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (GEODE-10103) Rebalance with no setting for include-region doesn't work for subregions

2022-03-10 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby resolved GEODE-10103.
-
Fix Version/s: 1.12.10
   1.13.9
   1.14.5
   1.15.0
   Resolution: Fixed

> Rebalance with no setting for include-region doesn't work for subregions
> 
>
> Key: GEODE-10103
> URL: https://issues.apache.org/jira/browse/GEODE-10103
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: needsTriage, pull-request-available
> Fix For: 1.12.10, 1.13.9, 1.14.5, 1.15.0
>
>
> Executing a command like this produces no output for the rebalance command 
> even though a region exists to rebalance:
> {noformat}
>  gfsh -e "connect --locator=localhost[23456]" -e "rebalance"{noformat}
> Output:
> {noformat}
> ./rebalance.sh 
> (1) Executing - connect --locator=localhost[23456]
> Connecting to Locator at [host=localhost, port=23456] ..
> Connecting to Manager at [host=192.168.1.5, port=1099] ..
> Successfully connected to: [host=192.168.1.5, port=1099]
> You are connected to a cluster of version: 1.16.0-build.0
> (2) Executing - rebalance{noformat}
> Running from gfsh directly does:
> {noformat}
> gfsh>rebalance
> gfsh> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (GEODE-10103) Rebalance with no setting for include-region doesn't work for subregions

2022-03-04 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-10103:
---

Assignee: Barrett Oglesby

> Rebalance with no setting for include-region doesn't work for subregions
> 
>
> Key: GEODE-10103
> URL: https://issues.apache.org/jira/browse/GEODE-10103
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: needsTriage
>
> Executing a command like this produces no output for the rebalance command 
> even though a region exists to rebalance:
> {noformat}
>  gfsh -e "connect --locator=localhost[23456]" -e "rebalance"{noformat}
> Output:
> {noformat}
> ./rebalance.sh 
> (1) Executing - connect --locator=localhost[23456]
> Connecting to Locator at [host=localhost, port=23456] ..
> Connecting to Manager at [host=192.168.1.5, port=1099] ..
> Successfully connected to: [host=192.168.1.5, port=1099]
> You are connected to a cluster of version: 1.16.0-build.0
> (2) Executing - rebalance{noformat}
> Running from gfsh directly does:
> {noformat}
> gfsh>rebalance
> gfsh> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (GEODE-10103) Rebalance with no setting for include-region doesn't work for subregions

2022-03-04 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-10103:
---

 Summary: Rebalance with no setting for include-region doesn't work 
for subregions
 Key: GEODE-10103
 URL: https://issues.apache.org/jira/browse/GEODE-10103
 Project: Geode
  Issue Type: Bug
  Components: gfsh
Reporter: Barrett Oglesby


Executing a command like this produces no output for the rebalance command even 
though a region exists to rebalance:
{noformat}
 gfsh -e "connect --locator=localhost[23456]" -e "rebalance"{noformat}
Output:
{noformat}
./rebalance.sh 

(1) Executing - connect --locator=localhost[23456]

Connecting to Locator at [host=localhost, port=23456] ..
Connecting to Manager at [host=192.168.1.5, port=1099] ..
Successfully connected to: [host=192.168.1.5, port=1099]

You are connected to a cluster of version: 1.16.0-build.0

(2) Executing - rebalance{noformat}
Running from gfsh directly does:
{noformat}
gfsh>rebalance
gfsh> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (GEODE-9910) Failure to auto-reconnect upon network partition

2022-02-22 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17494856#comment-17494856
 ] 

Barrett Oglesby edited comment on GEODE-9910 at 2/22/22, 6:35 PM:
--

With a modification to the product to simulate a failed JoinRequestMessage, I 
can reproduce this issue.
h3. Test
 # Start server 1 (becomes coordinator)
 # Start server 2
 # Play dead server 1
 # The servers disconnect from each other
 # Server 2 disconnects from the distributed system since it doesn't have quorum

When server 2 reconnects, it:
 - establishes quorum
 - starts the locator
 - is unable to join the distributed system (due to the modification I made)
 - attempts to reconnect again
 - fails because the locator is already started


was (Author: barry.oglesby):
With a modification to the product to simulate a failed JoinRequestMessage, I 
can reproduce this issue.

h3. Test
# Start server 1 (becomes coordinator)
# Start server 2
# Play dead server 2
# The servers disconnect from each other
# Server 2 disconnects from the distributed system since it doesn't have quorum

When server 2 reconnects, it:
 - establishes quorum
 - starts the locator
 - is unable to join the distributed system (due to the modification I made)
 - attempts to reconnect again
 - fails because the locator is already started

> Failure to auto-reconnect upon network partition
> 
>
> Key: GEODE-9910
> URL: https://issues.apache.org/jira/browse/GEODE-9910
> Project: Geode
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Surya Mudundi
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: GeodeOperationAPI, blocks-1.15.0​, needsTriage
> Attachments: geode-logs.zip
>
>
> Two node cluster with embedded locators failed to auto-reconnect when node-1 
> experienced network outage for couple of minutes and when node-1 recovered 
> from the outage, node-2 failed to auto-reconnect.
> node-2 tried to re-connect to node-1 as:
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Attempting to reconnect to the distributed system.  This is attempt #1.
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Attempting to reconnect to the distributed system.  This is attempt #2.
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Attempting to reconnect to the distributed system.  This is attempt #3.
> Finally reported below error after 3 attempts as:
> INFO  
> [org.apache.geode.logging.internal.LoggingProviderLoader]-[ReconnectThread] 
> [] Using org.apache.geode.logging.internal.SimpleLoggingProvider for service 
> org.apache.geode.logging.internal.spi.LoggingProvider
> INFO  [org.apache.geode.internal.InternalDataSerializer]-[ReconnectThread] [] 
> initializing InternalDataSerializer with 0 services
> INFO  
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] performing a quorum check to see if location services can be started early
> INFO  
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Quorum check passed - allowing location services to start early
> WARN  
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Exception occurred while trying to connect the system during reconnect
> java.lang.IllegalStateException: A locator can not be created because one 
> already exists in this JVM.
>         at 
> org.apache.geode.distributed.internal.InternalLocator.createLocator(InternalLocator.java:298)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalLocator.createLocator(InternalLocator.java:273)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.startInitLocator(InternalDistributedSystem.java:916)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:768)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.access$200(InternalDistributedSystem.java:135)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem$Builder.build(InternalDistributedSystem.java:3034)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.connectInternal(InternalDistributedSystem.java:290)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2605)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> 

[jira] [Commented] (GEODE-9910) Failure to auto-reconnect upon network partition

2022-02-18 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17494856#comment-17494856
 ] 

Barrett Oglesby commented on GEODE-9910:


With a modification to the product to simulate a failed JoinRequestMessage, I 
can reproduce this issue.

h3. Test
# Start server 1 (becomes coordinator)
# Start server 2
# Play dead server 2
# The servers disconnect from each other
# Server 2 disconnects from the distributed system since it doesn't have quorum

When server 2 reconnects, it:
 - establishes quorum
 - starts the locator
 - is unable to join the distributed system (due to the modification I made)
 - attempts to reconnect again
 - fails because the locator is already started

> Failure to auto-reconnect upon network partition
> 
>
> Key: GEODE-9910
> URL: https://issues.apache.org/jira/browse/GEODE-9910
> Project: Geode
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Surya Mudundi
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: GeodeOperationAPI, blocks-1.15.0​, needsTriage
> Attachments: geode-logs.zip
>
>
> Two node cluster with embedded locators failed to auto-reconnect when node-1 
> experienced network outage for couple of minutes and when node-1 recovered 
> from the outage, node-2 failed to auto-reconnect.
> node-2 tried to re-connect to node-1 as:
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Attempting to reconnect to the distributed system.  This is attempt #1.
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Attempting to reconnect to the distributed system.  This is attempt #2.
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Attempting to reconnect to the distributed system.  This is attempt #3.
> Finally reported below error after 3 attempts as:
> INFO  
> [org.apache.geode.logging.internal.LoggingProviderLoader]-[ReconnectThread] 
> [] Using org.apache.geode.logging.internal.SimpleLoggingProvider for service 
> org.apache.geode.logging.internal.spi.LoggingProvider
> INFO  [org.apache.geode.internal.InternalDataSerializer]-[ReconnectThread] [] 
> initializing InternalDataSerializer with 0 services
> INFO  
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] performing a quorum check to see if location services can be started early
> INFO  
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Quorum check passed - allowing location services to start early
> WARN  
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Exception occurred while trying to connect the system during reconnect
> java.lang.IllegalStateException: A locator can not be created because one 
> already exists in this JVM.
>         at 
> org.apache.geode.distributed.internal.InternalLocator.createLocator(InternalLocator.java:298)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalLocator.createLocator(InternalLocator.java:273)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.startInitLocator(InternalDistributedSystem.java:916)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:768)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.access$200(InternalDistributedSystem.java:135)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem$Builder.build(InternalDistributedSystem.java:3034)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.connectInternal(InternalDistributedSystem.java:290)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2605)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2424)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1275)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:2326)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.uncleanShutdown(GMSMembership.java:1187)
>  ~[geode-membership-1.14.0.jar:?]
>         at 
> 

[jira] [Comment Edited] (GEODE-9910) Failure to auto-reconnect upon network partition

2022-02-18 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17494855#comment-17494855
 ] 

Barrett Oglesby edited comment on GEODE-9910 at 2/19/22, 12:42 AM:
---

Here is some analysis of this issue.
h3. Server Addresses

node 1:

membership: 10.196.55.141(15661):42000
locator: 10.196.55.141:10335

node 2:

membership: 10.196.55.142(19002):42000
locator: 10.196.55.142:10335
h3. Node2 Initial Disconnect

node2 lost connectivity with node1 and removed it:
{noformat}
2021-11-28 04:03:45,084 INFO  
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode Failure 
Detection thread 9] [] Availability check failed for member 
10.196.55.141(15661):42000
2021-11-28 04:03:45,084 INFO  
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode Failure 
Detection thread 9] [] Requesting removal of suspect member 
10.196.55.141(15661):42000
2021-11-28 04:03:45,085 INFO  
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode Failure 
Detection thread 9] [] This member is becoming the membership coordinator with 
address 10.196.55.142(19002):42000
{noformat}
It then realized that quorum had been lost (node1 was coordinator with 
weight=15; node2 was not coordinator with weight=10):
{noformat}
2021-11-28 04:03:45,091 INFO  
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode 
Membership View Creator] [] View Creator thread is starting
2021-11-28 04:03:45,091 INFO  
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode 
Membership View Creator] []   10.196.55.141(15661):42000 had a weight 
of 15
2021-11-28 04:03:45,092 WARN  
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode 
Membership View Creator] [] total weight lost in this view change is 15 of 25.  
Quorum has been lost!
2021-11-28 04:03:45,092 FATAL 
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode 
Membership View Creator] [] Possible loss of quorum due to the loss of 1 cache 
processes: [10.196.55.141(15661):42000]
{noformat}
And disconnected itself from the distributed system:
{noformat}
2021-11-28 04:03:46,093 FATAL 
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode 
Membership View Creator] [] Membership service failure: Exiting due to possible 
network partition event due to loss of 1 cache processes: 
[10.196.55.141(15661):42000]
org.apache.geode.distributed.internal.membership.api.MemberDisconnectedException:
 Exiting due to possible network partition event due to loss of 1 cache 
processes: [10.196.55.141(15661):42000]
at 
org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.forceDisconnect(GMSMembership.java:1787)
 [geode-membership-1.14.0.jar:?]
at 
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.forceDisconnect(GMSJoinLeave.java:1122)
 [geode-membership-1.14.0.jar:?]
at 
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.access$1300(GMSJoinLeave.java:80)
 [geode-membership-1.14.0.jar:?]
at 
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.prepareAndSendView(GMSJoinLeave.java:2588)
 [geode-membership-1.14.0.jar:?]
at 
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.sendInitialView(GMSJoinLeave.java:2204)
 [geode-membership-1.14.0.jar:?]
at 
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.run(GMSJoinLeave.java:2286)
 [geode-membership-1.14.0.jar:?]
{noformat}
It stopped its locator:
{noformat}
2021-11-28 04:03:46,794 INFO 
[org.apache.geode.distributed.internal.InternalLocator]-[ReconnectThread] [] 
Distribution Locator on 
vmw-hcs-248e71fd-dd76-4111-ba82-379151aabbb7-3000-1-node-2/10.196.55.142 is 
stopped{noformat}
h3. Node2 Reconnect Attempt 1
{noformat}
2021-11-28 04:04:46,800 INFO  
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
 [] Attempting to reconnect to the distributed system.  This is attempt #1.
{noformat}
The first reconnect attempt failed to get quorum (it needed a weight of 13 but 
is 10):
{noformat}
2021-11-28 04:04:46,810 INFO  
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
 [] performing a quorum check to see if location services can be started early
2021-11-28 04:04:46,810 INFO  
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
 [] beginning quorum check with GMSQuorumChecker on view 
View[10.196.55.141(15661):42000|1] members: 
[10.196.55.141(15661):42000{lead}, 10.196.55.142(19002):42000]
2021-11-28 04:04:46,810 INFO  
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
 [] quorum check: sending request to 10.196.55.141(15661):42000
2021-11-28 

[jira] [Comment Edited] (GEODE-9910) Failure to auto-reconnect upon network partition

2022-02-18 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17494855#comment-17494855
 ] 

Barrett Oglesby edited comment on GEODE-9910 at 2/19/22, 12:41 AM:
---

Here is some analysis of this issue.
h3. Server Addresses

node 1:

membership: 10.196.55.141(15661):42000
locator: 10.196.55.141:10335

node 2:

membership: 10.196.55.142(19002):42000
locator: 10.196.55.142:10335
h3. Node2 Initial Disconnect

node2 lost connectivity with node1 and removed it:
{noformat}
2021-11-28 04:03:45,084 INFO  
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode Failure 
Detection thread 9] [] Availability check failed for member 
10.196.55.141(15661):42000
2021-11-28 04:03:45,084 INFO  
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode Failure 
Detection thread 9] [] Requesting removal of suspect member 
10.196.55.141(15661):42000
2021-11-28 04:03:45,085 INFO  
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode Failure 
Detection thread 9] [] This member is becoming the membership coordinator with 
address 10.196.55.142(19002):42000
{noformat}
It then realized that quorum had been lost (node1 was coordinator with 
weight=15; node2 was not coordinator with weight=10):
{noformat}
2021-11-28 04:03:45,091 INFO  
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode 
Membership View Creator] [] View Creator thread is starting
2021-11-28 04:03:45,091 INFO  
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode 
Membership View Creator] []   10.196.55.141(15661):42000 had a weight 
of 15
2021-11-28 04:03:45,092 WARN  
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode 
Membership View Creator] [] total weight lost in this view change is 15 of 25.  
Quorum has been lost!
2021-11-28 04:03:45,092 FATAL 
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode 
Membership View Creator] [] Possible loss of quorum due to the loss of 1 cache 
processes: [10.196.55.141(15661):42000]
{noformat}
And disconnected itself from the distributed system:
{noformat}
2021-11-28 04:03:46,093 FATAL 
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode 
Membership View Creator] [] Membership service failure: Exiting due to possible 
network partition event due to loss of 1 cache processes: 
[10.196.55.141(15661):42000]
org.apache.geode.distributed.internal.membership.api.MemberDisconnectedException:
 Exiting due to possible network partition event due to loss of 1 cache 
processes: [10.196.55.141(15661):42000]
at 
org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.forceDisconnect(GMSMembership.java:1787)
 [geode-membership-1.14.0.jar:?]
at 
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.forceDisconnect(GMSJoinLeave.java:1122)
 [geode-membership-1.14.0.jar:?]
at 
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.access$1300(GMSJoinLeave.java:80)
 [geode-membership-1.14.0.jar:?]
at 
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.prepareAndSendView(GMSJoinLeave.java:2588)
 [geode-membership-1.14.0.jar:?]
at 
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.sendInitialView(GMSJoinLeave.java:2204)
 [geode-membership-1.14.0.jar:?]
at 
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.run(GMSJoinLeave.java:2286)
 [geode-membership-1.14.0.jar:?]
{noformat}
It stopped its locator:
{noformat}
2021-11-28 04:03:46,794 INFO 
[org.apache.geode.distributed.internal.InternalLocator]-[ReconnectThread] [] 
Distribution Locator on 
vmw-hcs-248e71fd-dd76-4111-ba82-379151aabbb7-3000-1-node-2/10.196.55.142 is 
stopped{noformat}
h3. Node2 Reconnect Attempt 1
{noformat}
2021-11-28 04:04:46,800 INFO  
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
 [] Attempting to reconnect to the distributed system.  This is attempt #1.
{noformat}
The first reconnect attempt failed to get quorum (it needed a weight of 13 but 
is 10):
{noformat}
2021-11-28 04:04:46,810 INFO  
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
 [] performing a quorum check to see if location services can be started early
2021-11-28 04:04:46,810 INFO  
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
 [] beginning quorum check with GMSQuorumChecker on view 
View[10.196.55.141(15661):42000|1] members: 
[10.196.55.141(15661):42000{lead}, 10.196.55.142(19002):42000]
2021-11-28 04:04:46,810 INFO  
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
 [] quorum check: sending request to 10.196.55.141(15661):42000
2021-11-28 

[jira] [Commented] (GEODE-9910) Failure to auto-reconnect upon network partition

2022-02-18 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17494855#comment-17494855
 ] 

Barrett Oglesby commented on GEODE-9910:


Here is some analysis of this issue.
h3. Server Addresses

node 1:

membership: 10.196.55.141(15661):42000
locator: 10.196.55.141:10335

node 2:

membership: 10.196.55.142(19002):42000
locator: 10.196.55.142:10335
h3. Node2 Initial Disconnect

node2 lost connectivity with node1 and removed it:
{noformat}
2021-11-28 04:03:45,084 INFO  
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode Failure 
Detection thread 9] [] Availability check failed for member 
10.196.55.141(15661):42000
2021-11-28 04:03:45,084 INFO  
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode Failure 
Detection thread 9] [] Requesting removal of suspect member 
10.196.55.141(15661):42000
2021-11-28 04:03:45,085 INFO  
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode Failure 
Detection thread 9] [] This member is becoming the membership coordinator with 
address 10.196.55.142(19002):42000
{noformat}
It then realized that quorum had been lost (node1 was coordinator with 
weight=15; node2 was not coordinator with weight=10):
{noformat}
2021-11-28 04:03:45,091 INFO  
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode 
Membership View Creator] [] View Creator thread is starting
2021-11-28 04:03:45,091 INFO  
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode 
Membership View Creator] []   10.196.55.141(15661):42000 had a weight 
of 15
2021-11-28 04:03:45,092 WARN  
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode 
Membership View Creator] [] total weight lost in this view change is 15 of 25.  
Quorum has been lost!
2021-11-28 04:03:45,092 FATAL 
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode 
Membership View Creator] [] Possible loss of quorum due to the loss of 1 cache 
processes: [10.196.55.141(15661):42000]
{noformat}
And disconnected itself from the distributed system:
{noformat}
2021-11-28 04:03:46,093 FATAL 
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode 
Membership View Creator] [] Membership service failure: Exiting due to possible 
network partition event due to loss of 1 cache processes: 
[10.196.55.141(15661):42000]
org.apache.geode.distributed.internal.membership.api.MemberDisconnectedException:
 Exiting due to possible network partition event due to loss of 1 cache 
processes: [10.196.55.141(15661):42000]
at 
org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.forceDisconnect(GMSMembership.java:1787)
 [geode-membership-1.14.0.jar:?]
at 
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.forceDisconnect(GMSJoinLeave.java:1122)
 [geode-membership-1.14.0.jar:?]
at 
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.access$1300(GMSJoinLeave.java:80)
 [geode-membership-1.14.0.jar:?]
at 
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.prepareAndSendView(GMSJoinLeave.java:2588)
 [geode-membership-1.14.0.jar:?]
at 
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.sendInitialView(GMSJoinLeave.java:2204)
 [geode-membership-1.14.0.jar:?]
at 
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.run(GMSJoinLeave.java:2286)
 [geode-membership-1.14.0.jar:?]
{noformat}
It stopped its locator:
{noformat}
2021-11-28 04:03:46,794 INFO 
[org.apache.geode.distributed.internal.InternalLocator]-[ReconnectThread] [] 
Distribution Locator on 
vmw-hcs-248e71fd-dd76-4111-ba82-379151aabbb7-3000-1-node-2/10.196.55.142 is 
stopped{noformat}
h3. Node2 Reconnect Attempt 1
{noformat}
2021-11-28 04:04:46,800 INFO  
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
 [] Attempting to reconnect to the distributed system.  This is attempt #1.
{noformat}
The first retry attempt failed to get quorum (it needed a weight of 13 but is 
10):
{noformat}
2021-11-28 04:04:46,810 INFO  
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
 [] performing a quorum check to see if location services can be started early
2021-11-28 04:04:46,810 INFO  
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
 [] beginning quorum check with GMSQuorumChecker on view 
View[10.196.55.141(15661):42000|1] members: 
[10.196.55.141(15661):42000{lead}, 10.196.55.142(19002):42000]
2021-11-28 04:04:46,810 INFO  
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
 [] quorum check: sending request to 10.196.55.141(15661):42000
2021-11-28 04:04:46,810 INFO  

[jira] [Commented] (GEODE-9910) Failure to auto-reconnect upon network partition

2022-02-16 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17493725#comment-17493725
 ] 

Barrett Oglesby commented on GEODE-9910:


Can you attach the full logs of both servers?

> Failure to auto-reconnect upon network partition
> 
>
> Key: GEODE-9910
> URL: https://issues.apache.org/jira/browse/GEODE-9910
> Project: Geode
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Surya Mudundi
>Priority: Major
>  Labels: GeodeOperationAPI, blocks-1.15.0​, needsTriage
>
> Two node cluster with embedded locators failed to auto-reconnect when node-1 
> experienced network outage for couple of minutes and when node-1 recovered 
> from the outage, node-2 failed to auto-reconnect.
> node-2 tried to re-connect to node-1 as:
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Attempting to reconnect to the distributed system.  This is attempt #1.
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Attempting to reconnect to the distributed system.  This is attempt #2.
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Attempting to reconnect to the distributed system.  This is attempt #3.
> Finally reported below error after 3 attempts as:
> INFO  
> [org.apache.geode.logging.internal.LoggingProviderLoader]-[ReconnectThread] 
> [] Using org.apache.geode.logging.internal.SimpleLoggingProvider for service 
> org.apache.geode.logging.internal.spi.LoggingProvider
> INFO  [org.apache.geode.internal.InternalDataSerializer]-[ReconnectThread] [] 
> initializing InternalDataSerializer with 0 services
> INFO  
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] performing a quorum check to see if location services can be started early
> INFO  
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Quorum check passed - allowing location services to start early
> WARN  
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Exception occurred while trying to connect the system during reconnect
> java.lang.IllegalStateException: A locator can not be created because one 
> already exists in this JVM.
>         at 
> org.apache.geode.distributed.internal.InternalLocator.createLocator(InternalLocator.java:298)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalLocator.createLocator(InternalLocator.java:273)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.startInitLocator(InternalDistributedSystem.java:916)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:768)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.access$200(InternalDistributedSystem.java:135)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem$Builder.build(InternalDistributedSystem.java:3034)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.connectInternal(InternalDistributedSystem.java:290)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2605)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2424)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1275)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:2326)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.uncleanShutdown(GMSMembership.java:1187)
>  ~[geode-membership-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.lambda$forceDisconnect$0(GMSMembership.java:1811)
>  ~[geode-membership-1.14.0.jar:?]
>         at java.lang.Thread.run(Thread.java:829) [?:?]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (GEODE-9910) Failure to auto-reconnect upon network partition

2022-02-16 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-9910:
--

Assignee: Barrett Oglesby

> Failure to auto-reconnect upon network partition
> 
>
> Key: GEODE-9910
> URL: https://issues.apache.org/jira/browse/GEODE-9910
> Project: Geode
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Surya Mudundi
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: GeodeOperationAPI, blocks-1.15.0​, needsTriage
>
> Two node cluster with embedded locators failed to auto-reconnect when node-1 
> experienced network outage for couple of minutes and when node-1 recovered 
> from the outage, node-2 failed to auto-reconnect.
> node-2 tried to re-connect to node-1 as:
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Attempting to reconnect to the distributed system.  This is attempt #1.
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Attempting to reconnect to the distributed system.  This is attempt #2.
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Attempting to reconnect to the distributed system.  This is attempt #3.
> Finally reported below error after 3 attempts as:
> INFO  
> [org.apache.geode.logging.internal.LoggingProviderLoader]-[ReconnectThread] 
> [] Using org.apache.geode.logging.internal.SimpleLoggingProvider for service 
> org.apache.geode.logging.internal.spi.LoggingProvider
> INFO  [org.apache.geode.internal.InternalDataSerializer]-[ReconnectThread] [] 
> initializing InternalDataSerializer with 0 services
> INFO  
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] performing a quorum check to see if location services can be started early
> INFO  
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Quorum check passed - allowing location services to start early
> WARN  
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Exception occurred while trying to connect the system during reconnect
> java.lang.IllegalStateException: A locator can not be created because one 
> already exists in this JVM.
>         at 
> org.apache.geode.distributed.internal.InternalLocator.createLocator(InternalLocator.java:298)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalLocator.createLocator(InternalLocator.java:273)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.startInitLocator(InternalDistributedSystem.java:916)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:768)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.access$200(InternalDistributedSystem.java:135)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem$Builder.build(InternalDistributedSystem.java:3034)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.connectInternal(InternalDistributedSystem.java:290)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2605)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2424)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1275)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:2326)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.uncleanShutdown(GMSMembership.java:1187)
>  ~[geode-membership-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.lambda$forceDisconnect$0(GMSMembership.java:1811)
>  ~[geode-membership-1.14.0.jar:?]
>         at java.lang.Thread.run(Thread.java:829) [?:?]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (GEODE-10009) The CacheClientProxy for a durable client can be terminated when it shouldn't be

2022-02-08 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby resolved GEODE-10009.
-
Fix Version/s: 1.15.0
   Resolution: Fixed

> The CacheClientProxy for a durable client can be terminated when it shouldn't 
> be
> 
>
> Key: GEODE-10009
> URL: https://issues.apache.org/jira/browse/GEODE-10009
> Project: Geode
>  Issue Type: Bug
>  Components: client queues
>Affects Versions: 1.15.0
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: blocks-1.15.0​, pull-request-available
> Fix For: 1.15.0
>
>
> When the client connection is closed but the server has not left or crashed 
> (e.g in the re-authentication failed case), its possible that two threads in 
> a durable client can interleave in a way that causes an extra durable task to 
> be created on the server that eventually causes the CacheClientProxy to be 
> terminated.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (GEODE-10009) The CacheClientProxy for a durable client can be terminated when it shouldn't be

2022-02-01 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-10009:
---

Assignee: Barrett Oglesby

> The CacheClientProxy for a durable client can be terminated when it shouldn't 
> be
> 
>
> Key: GEODE-10009
> URL: https://issues.apache.org/jira/browse/GEODE-10009
> Project: Geode
>  Issue Type: Bug
>  Components: client queues
>Affects Versions: 1.15.0
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: needsTriage
>
> When the client connection is closed but the server has not left or crashed 
> (e.g in the re-authentication failed case), its possible that two threads in 
> a durable client can interleave in a way that causes an extra durable task to 
> be created on the server that eventually causes the CacheClientProxy to be 
> terminated.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (GEODE-10009) The CacheClientProxy for a durable client can be terminated when it shouldn't be

2022-02-01 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-10009:
---

 Summary: The CacheClientProxy for a durable client can be 
terminated when it shouldn't be
 Key: GEODE-10009
 URL: https://issues.apache.org/jira/browse/GEODE-10009
 Project: Geode
  Issue Type: Bug
  Components: client queues
Affects Versions: 1.15.0
Reporter: Barrett Oglesby


When the client connection is closed but the server has not left or crashed 
(e.g in the re-authentication failed case), its possible that two threads in a 
durable client can interleave in a way that causes an extra durable task to be 
created on the server that eventually causes the CacheClientProxy to be 
terminated.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (GEODE-9913) A retried event can fail if the original event is still being processed and a new event for that same key occurs at the same time

2022-01-18 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby resolved GEODE-9913.

Fix Version/s: 1.15.0
   Resolution: Fixed

> A retried event can fail if the original event is still being processed and a 
> new event for that same key occurs at the same time
> -
>
> Key: GEODE-9913
> URL: https://issues.apache.org/jira/browse/GEODE-9913
> Project: Geode
>  Issue Type: Bug
>  Components: client/server
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: GeodeOperationAPI, pull-request-available
> Fix For: 1.15.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9528) CI Failure: DistributionAdvisorIntegrationTest > verifyMembershipListenerIsRemovedAfterForceDisconnect

2022-01-18 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17478057#comment-17478057
 ] 

Barrett Oglesby commented on GEODE-9528:


I backported this change to support/1.14, support/1.13 and support/1.12.

> CI Failure: DistributionAdvisorIntegrationTest > 
> verifyMembershipListenerIsRemovedAfterForceDisconnect
> --
>
> Key: GEODE-9528
> URL: https://issues.apache.org/jira/browse/GEODE-9528
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Affects Versions: 1.12.5, 1.13.5, 1.14.0
>Reporter: Owen Nichols
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.9, 1.13.7, 1.14.3, 1.15.0
>
>
> {noformat}
> org.apache.geode.distributed.internal.DistributionAdvisorIntegrationTest > 
> verifyMembershipListenerIsRemovedAfterForceDisconnect FAILED
> org.junit.ComparisonFailure: expected:<[fals]e> but was:<[tru]e>
> at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at 
> org.apache.geode.distributed.internal.DistributionAdvisorIntegrationTest.verifyMembershipListenerIsRemovedAfterForceDisconnect(DistributionAdvisorIntegrationTest.java:57)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9528) CI Failure: DistributionAdvisorIntegrationTest > verifyMembershipListenerIsRemovedAfterForceDisconnect

2022-01-18 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby updated GEODE-9528:
---
Fix Version/s: 1.12.9
   1.13.7
   1.14.3

> CI Failure: DistributionAdvisorIntegrationTest > 
> verifyMembershipListenerIsRemovedAfterForceDisconnect
> --
>
> Key: GEODE-9528
> URL: https://issues.apache.org/jira/browse/GEODE-9528
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Affects Versions: 1.12.5, 1.13.5, 1.14.0
>Reporter: Owen Nichols
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.9, 1.13.7, 1.14.3, 1.15.0
>
>
> {noformat}
> org.apache.geode.distributed.internal.DistributionAdvisorIntegrationTest > 
> verifyMembershipListenerIsRemovedAfterForceDisconnect FAILED
> org.junit.ComparisonFailure: expected:<[fals]e> but was:<[tru]e>
> at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at 
> org.apache.geode.distributed.internal.DistributionAdvisorIntegrationTest.verifyMembershipListenerIsRemovedAfterForceDisconnect(DistributionAdvisorIntegrationTest.java:57)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (GEODE-9913) A retried event can fail if the original event is still being processed and a new event for that same key occurs at the same time

2022-01-03 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-9913:
--

 Summary: A retried event can fail if the original event is still 
being processed and a new event for that same key occurs at the same time
 Key: GEODE-9913
 URL: https://issues.apache.org/jira/browse/GEODE-9913
 Project: Geode
  Issue Type: Bug
  Components: client/server
Reporter: Barrett Oglesby






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (GEODE-9913) A retried event can fail if the original event is still being processed and a new event for that same key occurs at the same time

2022-01-03 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-9913:
--

Assignee: Barrett Oglesby

> A retried event can fail if the original event is still being processed and a 
> new event for that same key occurs at the same time
> -
>
> Key: GEODE-9913
> URL: https://issues.apache.org/jira/browse/GEODE-9913
> Project: Geode
>  Issue Type: Bug
>  Components: client/server
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (GEODE-9865) ConnectionManagerImpl forceCreateConnection to a specific server increments the count regardless whether the connection is successful

2021-12-06 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby resolved GEODE-9865.

Fix Version/s: 1.15.0
   Resolution: Fixed

> ConnectionManagerImpl forceCreateConnection to a specific server increments 
> the count regardless whether the connection is successful
> -
>
> Key: GEODE-9865
> URL: https://issues.apache.org/jira/browse/GEODE-9865
> Project: Geode
>  Issue Type: Bug
>  Components: client/server
>Affects Versions: 1.14.0
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> *ConnectionManagerImpl forceCreateConnection* does:
> {noformat}
> private PooledConnection forceCreateConnection(ServerLocation serverLocation)
>     throws ServerRefusedConnectionException, ServerOperationException {
>   connectionAccounting.create();
>   try {
>     return createPooledConnection(serverLocation);
>   } catch (GemFireSecurityException e) {
>     throw new ServerOperationException(e);
>   }
> }{noformat}
> The call to *connectionAccounting.create()* increments the count. If 
> *createPooledConnection* is unsuccessful, the count is not decremented. This 
> causes the client to think there are more connections than there actually are.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (GEODE-9865) ConnectionManagerImpl forceCreateConnection to a specific server increments the count regardless whether the connection is successful

2021-11-30 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-9865:
--

 Summary: ConnectionManagerImpl forceCreateConnection to a specific 
server increments the count regardless whether the connection is successful
 Key: GEODE-9865
 URL: https://issues.apache.org/jira/browse/GEODE-9865
 Project: Geode
  Issue Type: Bug
  Components: client/server
Affects Versions: 1.14.0
Reporter: Barrett Oglesby


*ConnectionManagerImpl forceCreateConnection* does:
{noformat}
private PooledConnection forceCreateConnection(ServerLocation serverLocation)
    throws ServerRefusedConnectionException, ServerOperationException {
  connectionAccounting.create();
  try {
    return createPooledConnection(serverLocation);
  } catch (GemFireSecurityException e) {
    throw new ServerOperationException(e);
  }
}{noformat}
The call to *connectionAccounting.create()* increments the count. If 
*createPooledConnection* is unsuccessful, the count is not decremented. This 
causes the client to think there are more connections than there actually are.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (GEODE-9865) ConnectionManagerImpl forceCreateConnection to a specific server increments the count regardless whether the connection is successful

2021-11-30 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-9865:
--

Assignee: Barrett Oglesby

> ConnectionManagerImpl forceCreateConnection to a specific server increments 
> the count regardless whether the connection is successful
> -
>
> Key: GEODE-9865
> URL: https://issues.apache.org/jira/browse/GEODE-9865
> Project: Geode
>  Issue Type: Bug
>  Components: client/server
>Affects Versions: 1.14.0
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>
> *ConnectionManagerImpl forceCreateConnection* does:
> {noformat}
> private PooledConnection forceCreateConnection(ServerLocation serverLocation)
>     throws ServerRefusedConnectionException, ServerOperationException {
>   connectionAccounting.create();
>   try {
>     return createPooledConnection(serverLocation);
>   } catch (GemFireSecurityException e) {
>     throw new ServerOperationException(e);
>   }
> }{noformat}
> The call to *connectionAccounting.create()* increments the count. If 
> *createPooledConnection* is unsuccessful, the count is not decremented. This 
> causes the client to think there are more connections than there actually are.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9664) Two different clients with the same durable id will both connect to the servers and receive messages

2021-10-05 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424732#comment-17424732
 ] 

Barrett Oglesby commented on GEODE-9664:


I checked the behavior of the client if there are no servers. I was thinking 
that these durable client scenarios should behave similar to the no servers 
scenario.

When the Pool is created, the QueueManagerImpl.initializeConnections attempts 
to create the connections. If there are no servers, the ConnectionList's 
primaryDiscoveryException is initialized like:
{noformat}
List servers =
findQueueServers(excludedServers, queuesNeeded, true, false, null);
if (servers == null || servers.isEmpty()) {
  scheduleRedundancySatisfierIfNeeded(redundancyRetryInterval);
  synchronized (lock) {
queueConnections = queueConnections.setPrimaryDiscoveryFailed(null);
lock.notifyAll();
  }
  return;
}
{noformat}
And the empty ConnectionList is created here:
{noformat}
java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(Thread.java:1333)
at 
org.apache.geode.cache.client.internal.QueueManagerImpl$ConnectionList.(QueueManagerImpl.java:1318)
at 
org.apache.geode.cache.client.internal.QueueManagerImpl$ConnectionList.setPrimaryDiscoveryFailed(QueueManagerImpl.java:1337)
at 
org.apache.geode.cache.client.internal.QueueManagerImpl.initializeConnections(QueueManagerImpl.java:439)
at 
org.apache.geode.cache.client.internal.QueueManagerImpl.start(QueueManagerImpl.java:293)
at 
org.apache.geode.cache.client.internal.PoolImpl.start(PoolImpl.java:359)
at 
org.apache.geode.cache.client.internal.PoolImpl.finishCreate(PoolImpl.java:183)
at 
org.apache.geode.cache.client.internal.PoolImpl.create(PoolImpl.java:169)
at 
org.apache.geode.internal.cache.PoolFactoryImpl.create(PoolFactoryImpl.java:378)
{noformat}
Then when Region.registerInterestForAllKeys is called, it invokes 
ServerRegionProxy.registerInterest which:

- adds the key to the RegisterInterestTracker
- executes the RegisterInterestOp
- removed from key from the RegisterInterestTracker if the RegisterInterestOp 
fails

Here is the code in Region.registerInterestForAllKeys that does the above steps:
{noformat}
try {
  rit.addSingleInterest(region, key, interestType, policy, isDurable,
  receiveUpdatesAsInvalidates);
  result = RegisterInterestOp.execute(pool, regionName, key, interestType, 
policy,
  isDurable, receiveUpdatesAsInvalidates, regionDataPolicy);
  finished = true;
  return result;
} finally {
  if (!finished) {
rit.removeSingleInterest(region, key, interestType, isDurable,
receiveUpdatesAsInvalidates);
  }
}
{noformat}
The Connections are retrieved in QueueManagerImpl.getAllConnections. If there 
are none, a NoSubscriptionServersAvailableException wrapping the 
primaryDiscoveryException is thrown:
{noformat}
Exception in thread "main" 
org.apache.geode.cache.NoSubscriptionServersAvailableException: 
org.apache.geode.cache.NoSubscriptionServersAvailableException: Primary 
discovery failed.
at 
org.apache.geode.cache.client.internal.QueueManagerImpl.getAllConnections(QueueManagerImpl.java:191)
at 
org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnQueuesAndReturnPrimaryResult(OpExecutorImpl.java:428)
at 
org.apache.geode.cache.client.internal.PoolImpl.executeOnQueuesAndReturnPrimaryResult(PoolImpl.java:875)
at 
org.apache.geode.cache.client.internal.RegisterInterestOp.execute(RegisterInterestOp.java:58)
at 
org.apache.geode.cache.client.internal.ServerRegionProxy.registerInterest(ServerRegionProxy.java:364)
at 
org.apache.geode.internal.cache.LocalRegion.processSingleInterest(LocalRegion.java:3815)
at 
org.apache.geode.internal.cache.LocalRegion.registerInterestRegex(LocalRegion.java:3911)
at 
org.apache.geode.internal.cache.LocalRegion.registerInterestRegex(LocalRegion.java:3890)
at 
org.apache.geode.internal.cache.LocalRegion.registerInterestRegex(LocalRegion.java:3885)
at 
org.apache.geode.cache.Region.registerInterestForAllKeys(Region.java:1709)
{noformat}
Here is logging that shows all this behavior:
{noformat}
[warn 2021/10/04 10:58:22.184 PDT client-a-1  tid=0x1] XXX 
ConnectionList. 
primaryDiscoveryException=org.apache.geode.cache.NoSubscriptionServersAvailableException:
 Primary discovery failed.

[warn 2021/10/04 10:58:22.238 PDT client-a-1  tid=0x1] XXX 
RegisterInterestTracker.addSingleInterest key=.*; rieInterests={.*=KEYS_VALUES}

[warn 2021/10/04 10:58:22.238 PDT client-a-1  tid=0x1] XXX 
ServerRegionProxy.registerInterest about to execute RegisterInterestOp

[warn 2021/10/04 10:58:22.244 PDT client-a-1  tid=0x1] XXX 
QueueManagerImpl.getAllConnections about to throw 
exception=org.apache.geode.cache.NoSubscriptionServersAvailableException: 
org.apache.geode.cache.NoSubscriptionServersAvailableException: Primary 

[jira] [Created] (GEODE-9664) Two different clients with the same durable id will both connect to the servers and receive messages

2021-10-01 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-9664:
--

 Summary: Two different clients with the same durable id will both 
connect to the servers and receive messages
 Key: GEODE-9664
 URL: https://issues.apache.org/jira/browse/GEODE-9664
 Project: Geode
  Issue Type: Bug
  Components: client queues
Reporter: Barrett Oglesby


There are two cases:
 # The number of queues is the same as the number of servers (e.g. client with 
subscription-redundancy=1 and 2 servers)
 # The number of queues is less than the number of servers (e.g. client with 
subscription-redundancy=0 and 2 servers)

h2. Case 1
 In this case, the client first attempts to connect to the primary and fails.
{noformat}
[warn 2021/10/01 14:37:56.209 PDT server-1  tid=0x4b] XXX CacheClientNotifier.registerClientInternal about to register 
clientProxyMembershipID=identity(127.0.0.1(client-a-2:89832:loner):61596:fad3ca3d:client-a-2,connection=2,durableAttributes=DurableClientAttributes[id=client-a;
 timeout=300])

[warn 2021/10/01 14:37:56.209 PDT server-1  tid=0x4b] XXX CacheClientNotifier.registerClientInternal existing 
proxy=CacheClientProxy[identity(127.0.0.1(client-a-1:89806:loner):61573:10a9ca3d:client-a-1,connection=2,durableAttributes=DurableClientAttributes[id=client-a;
 timeout=300]); port=61581; primary=true; version=GEODE 1.15.0]

[warn 2021/10/01 14:37:56.210 PDT server-1  tid=0x4b] XXX CacheClientNotifier.registerClientInternal existing proxy 
isPaused=false

[warn 2021/10/01 14:37:56.210 PDT server-1  tid=0x4b] The requested durable client has the same identifier ( client-a ) 
as an existing durable client ( 
CacheClientProxy[identity(127.0.0.1(client-a-1:89806:loner):61573:10a9ca3d:client-a-1,connection=2,durableAttributes=DurableClientAttributes[id=client-a;
 timeout=300]); port=61581; primary=true; version=GEODE 1.15.0] ). Duplicate 
durable clients are not allowed.

[warn 2021/10/01 14:37:56.210 PDT server-1  tid=0x4b] CacheClientNotifier: Unsuccessfully registered client with 
identifier 
identity(127.0.0.1(client-a-2:89832:loner):61596:fad3ca3d:client-a-2,connection=2,durableAttributes=DurableClientAttributes[id=client-a;
 timeout=300]) and response code 64
{noformat}
It then attempts to connect to the secondary and succeeds.
{noformat}
[warn 2021/10/01 14:37:56.215 PDT server-2  tid=0x47] XXX CacheClientNotifier.registerClientInternal about to register 
clientProxyMembershipID=identity(127.0.0.1(client-a-2:89832:loner):61596:fad3ca3d:client-a-2,connection=2,durableAttributes=DurableClientAttributes[id=client-a;
 timeout=300])

[warn 2021/10/01 14:37:56.215 PDT server-2  tid=0x47] XXX CacheClientNotifier.registerClientInternal existing 
proxy=CacheClientProxy[identity(127.0.0.1(client-a-1:89806:loner):61573:10a9ca3d:client-a-1,connection=2,durableAttributes=DurableClientAttributes[id=client-a;
 timeout=300]); port=61578; primary=false; version=GEODE 1.15.0]

[warn 2021/10/01 14:37:56.216 PDT server-2  tid=0x47] XXX CacheClientNotifier.registerClientInternal existing proxy 
isPaused=true

[warn 2021/10/01 14:37:56.217 PDT server-2  tid=0x47] XXX CacheClientNotifier.registerClientInternal reinitialized 
existing 
proxy=CacheClientProxy[identity(127.0.0.1(client-a-1:89806:loner):61573:10a9ca3d:client-a-1,connection=2,durableAttributes=DurableClientAttributes[id=client-a;
 timeout=300]); port=61578; primary=true; version=GEODE 1.15.0]
{noformat}
The previous secondary is reinitialized and made into a primary. Both queues 
will dispatch events.

The CacheClientNotifier.registerClientInternal method invoked when a client 
connects does:
{noformat}
if (cacheClientProxy.isPaused()) {
  ...
  cacheClientProxy.reinitialize(...);
} else {
  unsuccessfulMsg = String.format("The requested durable client has the same 
identifier ( %s ) as an existing durable client...);
  logger.warn(unsuccessfulMsg);
}
{noformat}
The CacheClientProxy is paused when the durable client it represents has 
disconnected. Unfortunately, a secondary CacheClientProxy is also paused. So, 
this check is not good enough to prevent a duplicate durable client from 
connecting.

There are a few things that can also be checked. One of them is:
{noformat}
cacheClientProxy.getCommBuffer() == null
{noformat}
With that check added, when the client attempts to connect to the secondary, it 
fails just like the it does with the primary.

The client then exits with this exception:
{noformat}
geode.cache.NoSubscriptionServersAvailableException: 
org.apache.geode.cache.NoSubscriptionServersAvailableException: Could not 
initialize a primary queue on startup. No queue servers available.
at 
org.apache.geode.cache.client.internal.QueueManagerImpl.getAllConnections(QueueManagerImpl.java:191)
at 
org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnQueuesAndReturnPrimaryResult(OpExecutorImpl.java:428)
at 

[jira] [Created] (GEODE-9620) The CacheServerStats currentQueueConnections statistic is incremented and decremented twice per client queue

2021-09-21 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-9620:
--

 Summary: The CacheServerStats currentQueueConnections statistic is 
incremented and decremented twice per client queue
 Key: GEODE-9620
 URL: https://issues.apache.org/jira/browse/GEODE-9620
 Project: Geode
  Issue Type: Bug
  Components: client queues
Reporter: Barrett Oglesby


The CacheServerStats currentQueueConnections statistic is incremented and 
decremented twice per client queue

When a client with subscription enabled joins connects to the server, the 
CacheServerStats currentQueueConnections statistic is incremented twice.

Once by the ServerConnection thread here:
{noformat}
[warn 2021/09/21 11:22:18.851 PDT server-1  tid=0x41] XXX CacheServerStats.incCurrentQueueConnections 
currentQueueConnectionsId=1
java.lang.Exception
at 
org.apache.geode.internal.cache.tier.sockets.CacheServerStats.incCurrentQueueConnections(CacheServerStats.java:660)
at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.handshakeAccepted(ServerConnection.java:705)
at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.acceptHandShake(ServerConnection.java:682)
at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.processHandShake(ServerConnection.java:613)
at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.verifyClientConnection(ServerConnection.java:404)
at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.doHandshake(ServerConnection.java:787)
{noformat}
And once by the Client Queue Initialization Thread here:
{noformat}
[warn 2021/09/21 11:22:18.884 PDT server-1  tid=0x44] XXX CacheServerStats.incCurrentQueueConnections 
currentQueueConnectionsId=2
java.lang.Exception
at 
org.apache.geode.internal.cache.tier.sockets.CacheServerStats.incCurrentQueueConnections(CacheServerStats.java:660)
at 
org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.(CacheClientProxy.java:342)
at 
org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.(CacheClientProxy.java:306)
at 
org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.registerClientInternal(CacheClientNotifier.java:379)
at 
org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.registerClient(CacheClientNotifier.java:198)
at 
org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$ClientQueueInitializerTask.run(AcceptorImpl.java:1896)
{noformat}
When the client disconnects from the server, the CacheServerStats 
currentQueueConnections statistic is decremented twice.

Once by the ServerConnection thread here:
{noformat}
[warn 2021/09/21 11:24:01.129 PDT server-1  tid=0x41] XXX CacheServerStats.decCurrentQueueConnections 
currentQueueConnectionsId=1
java.lang.Exception
at 
org.apache.geode.internal.cache.tier.sockets.CacheServerStats.decCurrentQueueConnections(CacheServerStats.java:665)
at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.handleTermination(ServerConnection.java:956)
at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.handleTermination(ServerConnection.java:929)
at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1289)
{noformat}
And once by a different ServerConnection thread here:
{noformat}
[warn 2021/09/21 11:24:01.135 PDT server-1  tid=0x42] XXX CacheServerStats.decCurrentQueueConnections 
currentQueueConnectionsId=0
java.lang.Exception
at 
org.apache.geode.internal.cache.tier.sockets.CacheServerStats.decCurrentQueueConnections(CacheServerStats.java:665)
at 
org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.closeSocket(CacheClientProxy.java:939)
at 
org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.terminateDispatching(CacheClientProxy.java:895)
at 
org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.close(CacheClientProxy.java:773)
at 
org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.closeDeadProxies(CacheClientNotifier.java:1558)
at 
org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.unregisterClient(CacheClientNotifier.java:572)
at 
org.apache.geode.internal.cache.tier.sockets.ClientHealthMonitor.unregisterClient(ClientHealthMonitor.java:268)
at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.handleTermination(ServerConnection.java:1008)
at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.handleTermination(ServerConnection.java:929)
at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1289)
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEODE-9528) CI Failure: DistributionAdvisorIntegrationTest > verifyMembershipListenerIsRemovedAfterForceDisconnect

2021-08-30 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby resolved GEODE-9528.

Resolution: Fixed

> CI Failure: DistributionAdvisorIntegrationTest > 
> verifyMembershipListenerIsRemovedAfterForceDisconnect
> --
>
> Key: GEODE-9528
> URL: https://issues.apache.org/jira/browse/GEODE-9528
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Affects Versions: 1.12.5, 1.13.5, 1.14.0
>Reporter: Owen Nichols
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: pull-request-available
>
> {noformat}
> org.apache.geode.distributed.internal.DistributionAdvisorIntegrationTest > 
> verifyMembershipListenerIsRemovedAfterForceDisconnect FAILED
> org.junit.ComparisonFailure: expected:<[fals]e> but was:<[tru]e>
> at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at 
> org.apache.geode.distributed.internal.DistributionAdvisorIntegrationTest.verifyMembershipListenerIsRemovedAfterForceDisconnect(DistributionAdvisorIntegrationTest.java:57)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-9528) CI Failure: DistributionAdvisorIntegrationTest > verifyMembershipListenerIsRemovedAfterForceDisconnect

2021-08-24 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404068#comment-17404068
 ] 

Barrett Oglesby commented on GEODE-9528:


This is a test issue.

The test currently does:

1. force disconnect
2. assert the MembershipListener for the region is removed

The finally block of GMSMembership.ManagerImpl.forceDisconnect spins off the 
DisconnectThread to do the actual disconnect. The MembershipListener is removed 
by that thread.
{noformat}
} finally {
  new LoggingThread("DisconnectThread", false, () -> {
lifecycleListener.forcedDisconnect();
uncleanShutdown(reason, shutdownCause);
  }).start();
}
{noformat}
If there is a delay in the DisconnectThread processing, the test could fail.

Here is the normal flow:

1. The Test worker thread invokes forceDisconnectMember
2. The DisconnectThread removes the MembershipListener
3. The Test worker thread asserts the listener is removed

Here is logging that shows this behavior:
{noformat}
[warn 2021/08/24 13:38:47.445 PDT server  tid=0xb] XXX 
DistributionAdvisorIntegrationTest.verifyMembershipListenerIsRemovedAfterForceDisconnect
 before forceDisconnectMember

[warn 2021/08/24 13:38:47.452 PDT server  tid=0x32] XXX 
ManagerImpl.forceDisconnect about to uncleanShutdown

[warn 2021/08/24 13:38:47.472 PDT server  tid=0x32] XXX 
DistributionAdvisor.close removeMembershipListener 
advisee=verifyMembershipListenerIsRemovedAfterForceDisconnect

[warn 2021/08/24 13:38:47.566 PDT server  tid=0xb] XXX 
DistributionAdvisorIntegrationTest.verifyMembershipListenerIsRemovedAfterForceDisconnect
 after forceDisconnectMember

[warn 2021/08/24 13:38:47.567 PDT server  tid=0xb] XXX 
DistributionAdvisorIntegrationTest.verifyMembershipListenerIsRemovedAfterForceDisconnect
 assert
{noformat}
If the DisconnectThread has any kind of delay in running, the order changes, 
and the test fails:
{noformat}
[warn 2021/08/24 14:05:29.270 PDT server  tid=0xb] XXX 
DistributionAdvisorIntegrationTest.verifyMembershipListenerIsRemovedAfterForceDisconnect
 before forceDisconnectMember

[warn 2021/08/24 14:05:29.382 PDT server  tid=0x32] XXX 
ManagerImpl.forceDisconnect about to uncleanShutdown

[warn 2021/08/24 14:05:29.392 PDT server  tid=0xb] XXX 
DistributionAdvisorIntegrationTest.verifyMembershipListenerIsRemovedAfterForceDisconnect
 after forceDisconnectMember

[warn 2021/08/24 14:05:29.393 PDT server  tid=0xb] XXX 
DistributionAdvisorIntegrationTest.verifyMembershipListenerIsRemovedAfterForceDisconnect
 assert

[warn 2021/08/24 14:05:29.403 PDT server  tid=0x32] XXX 
DistributionAdvisor.close removeMembershipListener 
advisee=verifyMembershipListenerIsRemovedAfterForceDisconnect

org.junit.ComparisonFailure: 
Expecting value to be false but was true expected:<[fals]e> but was:<[tru]e>
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at 
org.apache.geode.distributed.internal.DistributionAdvisorIntegrationTest.verifyMembershipListenerIsRemovedAfterForceDisconnect(DistributionAdvisorIntegrationTest.java:62)
{noformat}
The fix is to add await().untilAsserted like:
{noformat}
await().untilAsserted(
  () -> 
assertThat(manager.getMembershipListeners().contains(listener)).isFalse());
{noformat}
With the introduced delay, the test failed 80/100 times. With the await change, 
the test passed 100/100 times.


> CI Failure: DistributionAdvisorIntegrationTest > 
> verifyMembershipListenerIsRemovedAfterForceDisconnect
> --
>
> Key: GEODE-9528
> URL: https://issues.apache.org/jira/browse/GEODE-9528
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Affects Versions: 1.12.5, 1.13.5, 1.14.0
>Reporter: Owen Nichols
>Assignee: Barrett Oglesby
>Priority: Major
>
> {noformat}
> org.apache.geode.distributed.internal.DistributionAdvisorIntegrationTest > 
> verifyMembershipListenerIsRemovedAfterForceDisconnect FAILED
> org.junit.ComparisonFailure: expected:<[fals]e> but was:<[tru]e>
> at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at 
> org.apache.geode.distributed.internal.DistributionAdvisorIntegrationTest.verifyMembershipListenerIsRemovedAfterForceDisconnect(DistributionAdvisorIntegrationTest.java:57)
>  {noformat}



--
This 

[jira] [Assigned] (GEODE-9528) CI Failure: DistributionAdvisorIntegrationTest > verifyMembershipListenerIsRemovedAfterForceDisconnect

2021-08-24 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-9528:
--

Assignee: Barrett Oglesby  (was: Ernest Burghardt)

> CI Failure: DistributionAdvisorIntegrationTest > 
> verifyMembershipListenerIsRemovedAfterForceDisconnect
> --
>
> Key: GEODE-9528
> URL: https://issues.apache.org/jira/browse/GEODE-9528
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Affects Versions: 1.12.5, 1.13.5, 1.14.0
>Reporter: Owen Nichols
>Assignee: Barrett Oglesby
>Priority: Major
>
> {noformat}
> org.apache.geode.distributed.internal.DistributionAdvisorIntegrationTest > 
> verifyMembershipListenerIsRemovedAfterForceDisconnect FAILED
> org.junit.ComparisonFailure: expected:<[fals]e> but was:<[tru]e>
> at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at 
> org.apache.geode.distributed.internal.DistributionAdvisorIntegrationTest.verifyMembershipListenerIsRemovedAfterForceDisconnect(DistributionAdvisorIntegrationTest.java:57)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (GEODE-9441) The NestedFunctionExecutionDistributedTest uses too many threads

2021-07-20 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-9441:
--

Assignee: Dale Emery

> The NestedFunctionExecutionDistributedTest uses too many threads
> 
>
> Key: GEODE-9441
> URL: https://issues.apache.org/jira/browse/GEODE-9441
> Project: Geode
>  Issue Type: Test
>  Components: tests
>Reporter: Barrett Oglesby
>Assignee: Dale Emery
>Priority: Major
>
> The {{NestedFunctionExecutionDistributedTest}} uses {{OperationExecutors 
> MAX_FE_THREADS}} to configure both client function invocations and cache 
> server max connections.
> It uses MAX_FE_THREADS * 2 for function executions which use Function 
> Execution Processor threads:
> {noformat}
> client.invoke(() -> executeFunction(new ParentFunction(), MAX_FE_THREADS * 
> 2));
> {noformat}
> And potentially MAX_FE_THREADS * 3 for client connections which use 
> ServerConnection threads:
> {noformat}
> cacheServer.setMaxConnections(Math.max(CacheServer.DEFAULT_MAX_CONNECTIONS, 
> MAX_FE_THREADS * 3));
> {noformat}
> MAX_FE_THREADS was changed recently to:
> {noformat}
> Math.max(Runtime.getRuntime().availableProcessors() * 16, 16))
> {noformat}
> It doesn't need to use this many threads to test the behavior it is testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-9441) The NestedFunctionExecutionDistributedTest uses too many threads

2021-07-20 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-9441:
--

 Summary: The NestedFunctionExecutionDistributedTest uses too many 
threads
 Key: GEODE-9441
 URL: https://issues.apache.org/jira/browse/GEODE-9441
 Project: Geode
  Issue Type: Test
  Components: tests
Reporter: Barrett Oglesby


The {{NestedFunctionExecutionDistributedTest}} uses {{OperationExecutors 
MAX_FE_THREADS}} to configure both client function invocations and cache server 
max connections.

It uses MAX_FE_THREADS * 2 for function executions which use Function Execution 
Processor threads:
{noformat}
client.invoke(() -> executeFunction(new ParentFunction(), MAX_FE_THREADS * 2));
{noformat}
And potentially MAX_FE_THREADS * 3 for client connections which use 
ServerConnection threads:
{noformat}
cacheServer.setMaxConnections(Math.max(CacheServer.DEFAULT_MAX_CONNECTIONS, 
MAX_FE_THREADS * 3));
{noformat}
MAX_FE_THREADS was changed recently to:
{noformat}
Math.max(Runtime.getRuntime().availableProcessors() * 16, 16))
{noformat}
It doesn't need to use this many threads to test the behavior it is testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-9392) A gfsh query returning a Struct containing a PdxInstance behaves differently than one returning just the PdxInstance in some cases

2021-07-14 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380896#comment-17380896
 ] 

Barrett Oglesby commented on GEODE-9392:


Code using ObjectMapper like above didn't address the issue.

Since LocalDateTime not supported by default in ObjectMapper, it throws this 
exception:
{noformat}
Caused by: java.lang.RuntimeException: Java 8 date/time type 
`java.time.LocalDateTime` not supported by default: add Module 
"com.fasterxml.jackson.datatype:jackson-datatype-jsr310" to enable handling
at 
org.apache.geode.pdx.internal.json.PdxToJSON.getJSON(PdxToJSON.java:67)
at 
org.apache.geode.pdx.JSONFormatter.fromPdxInstance(JSONFormatter.java:239)
{noformat}
I added jackson-datatype-jsr310-2.12.3.jar to the server's classpath and used 
this code in the last else clause in PdxToJSON.writeValue:
{noformat}
ObjectMapper mapper = new ObjectMapper();
mapper.findAndRegisterModules();
jg.writeString(mapper.writeValueAsString(value));
{noformat}
And that worked:
{noformat}
Executing - query --query='select * from /data'

Result : true
Limit  : 100
Rows   : 1

productId  | partnerProductId | onlineRelevance
-- |  | 
-
151895 | 151895   | 
{"value":"value1","valueChangeDate":"[2021,7,14,16,18,29,78400]"}
{noformat}

> A gfsh query returning a Struct containing a PdxInstance behaves differently 
> than one returning just the PdxInstance in some cases
> --
>
> Key: GEODE-9392
> URL: https://issues.apache.org/jira/browse/GEODE-9392
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Barrett Oglesby
>Priority: Major
>
> This is true when the PdxInstance contains a data type that is not supported 
> by PdxToJSON (like Date or Character).
> If objects like this are stored as PdxInstances:
> {noformat}
> public class Position {
>   private String id;
>   private Date tradeDate;
>   private Character type;
>   ...
> }
> {noformat}
> A query like this is successful:
> {noformat}
> Executing - query --query='select * from /positions'
> Result : true
> Limit  : 100
> Rows   : 10
>   tradeDate   | id | type
> - | -- | 
> 1624316618413 | 3  | "a"
> 1624316618324 | 0  | "a"
> 1624316618418 | 5  | "a"
> 1624316618421 | 6  | "a"
> 1624316618407 | 1  | "a"
> 1624316618426 | 8  | "a"
> 1624316618428 | 9  | "a"
> 1624316618415 | 4  | "a"
> 1624316618423 | 7  | "a"
> 1624316618410 | 2  | "a"
> {noformat}
> But a query like this is not:
> {noformat}
> Executing - query --query="select key,value from /positions.entries where 
> value.id = '0'"
> Result  : false
> Message : Could not create JSON document from PdxInstance
> {noformat}
> It fails with this exception in the server:
> {noformat}
> org.apache.geode.pdx.JSONFormatterException: Could not create JSON document 
> from PdxInstance
>   at 
> org.apache.geode.pdx.JSONFormatter.fromPdxInstance(JSONFormatter.java:241)
>   at org.apache.geode.pdx.JSONFormatter.toJSON(JSONFormatter.java:226)
>   at 
> org.apache.geode.management.internal.cli.domain.DataCommandResult$SelectResultRow.valueToJson(DataCommandResult.java:732)
>   at 
> org.apache.geode.management.internal.cli.domain.DataCommandResult$SelectResultRow.resolveStructToColumns(DataCommandResult.java:717)
>   at 
> org.apache.geode.management.internal.cli.domain.DataCommandResult$SelectResultRow.resolveObjectToColumns(DataCommandResult.java:692)
>   at 
> org.apache.geode.management.internal.cli.domain.DataCommandResult$SelectResultRow.createColumnValues(DataCommandResult.java:680)
>   at 
> org.apache.geode.management.internal.cli.domain.DataCommandResult$SelectResultRow.(DataCommandResult.java:663)
>   at 
> org.apache.geode.management.internal.cli.functions.DataCommandFunction.createSelectResultRow(DataCommandFunction.java:270)
>   at 
> org.apache.geode.management.internal.cli.functions.DataCommandFunction.select_SelectResults(DataCommandFunction.java:256)
>   at 
> org.apache.geode.management.internal.cli.functions.DataCommandFunction.select(DataCommandFunction.java:224)
>   at 
> org.apache.geode.management.internal.cli.functions.DataCommandFunction.select(DataCommandFunction.java:177)
>   at 
> org.apache.geode.management.internal.cli.functions.DataCommandFunction.execute(DataCommandFunction.java:126)
> Caused by: java.lang.IllegalStateException: PdxInstance returns unknwon 
> pdxfield tradeDate for type Mon Jun 21 16:03:38 PDT 2021
>   at 
> org.apache.geode.pdx.internal.json.PdxToJSON.writeValue(PdxToJSON.java:148)
>   at 
> org.apache.geode.pdx.internal.json.PdxToJSON.getJSONString(PdxToJSON.java:185)
> 

[jira] [Commented] (GEODE-9392) A gfsh query returning a Struct containing a PdxInstance behaves differently than one returning just the PdxInstance in some cases

2021-07-14 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380890#comment-17380890
 ] 

Barrett Oglesby commented on GEODE-9392:


A {{select *}} query fails in the same way if a field of the value is a 
PdxInstance and that PdxInstance contains an unsupported data type.

For example, if the region contains Product objects and a Product contains a 
Relevance object which contains a java LocalDataTime.

LocalDateTime is not supported by PdxToJSON, so the query fails with a similar 
stack to the above struct one:
{noformat}
[info 2021/07/14 15:53:00.349 PDT server1  
tid=0x3f] Exception occurred:
org.apache.geode.pdx.JSONFormatterException: Could not create JSON document 
from PdxInstance
at 
org.apache.geode.pdx.JSONFormatter.fromPdxInstance(JSONFormatter.java:241)
at org.apache.geode.pdx.JSONFormatter.toJSON(JSONFormatter.java:226)
at 
org.apache.geode.management.internal.cli.domain.DataCommandResult$SelectResultRow.valueToJson(DataCommandResult.java:731)
at 
org.apache.geode.management.internal.cli.domain.DataCommandResult$SelectResultRow.resolvePdxToColumns(DataCommandResult.java:711)
at 
org.apache.geode.management.internal.cli.domain.DataCommandResult$SelectResultRow.resolveObjectToColumns(DataCommandResult.java:688)
at 
org.apache.geode.management.internal.cli.domain.DataCommandResult$SelectResultRow.createColumnValues(DataCommandResult.java:680)
at 
org.apache.geode.management.internal.cli.domain.DataCommandResult$SelectResultRow.(DataCommandResult.java:663)
at 
org.apache.geode.management.internal.cli.functions.DataCommandFunction.createSelectResultRow(DataCommandFunction.java:270)
at 
org.apache.geode.management.internal.cli.functions.DataCommandFunction.select_SelectResults(DataCommandFunction.java:256)
at 
org.apache.geode.management.internal.cli.functions.DataCommandFunction.select(DataCommandFunction.java:224)
at 
org.apache.geode.management.internal.cli.functions.DataCommandFunction.select(DataCommandFunction.java:177)
at 
org.apache.geode.management.internal.cli.functions.DataCommandFunction.execute(DataCommandFunction.java:126)
Caused by: java.lang.IllegalStateException: The pdx field valueChangeDate has a 
value 2021-07-14T15:52:54.970 whose type class java.time.LocalDateTime can not 
be converted to JSON.
at 
org.apache.geode.pdx.internal.json.PdxToJSON.writeValue(PdxToJSON.java:148)
at 
org.apache.geode.pdx.internal.json.PdxToJSON.getJSONString(PdxToJSON.java:178)
at 
org.apache.geode.pdx.internal.json.PdxToJSON.getJSON(PdxToJSON.java:60)
{noformat}

> A gfsh query returning a Struct containing a PdxInstance behaves differently 
> than one returning just the PdxInstance in some cases
> --
>
> Key: GEODE-9392
> URL: https://issues.apache.org/jira/browse/GEODE-9392
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Barrett Oglesby
>Priority: Major
>
> This is true when the PdxInstance contains a data type that is not supported 
> by PdxToJSON (like Date or Character).
> If objects like this are stored as PdxInstances:
> {noformat}
> public class Position {
>   private String id;
>   private Date tradeDate;
>   private Character type;
>   ...
> }
> {noformat}
> A query like this is successful:
> {noformat}
> Executing - query --query='select * from /positions'
> Result : true
> Limit  : 100
> Rows   : 10
>   tradeDate   | id | type
> - | -- | 
> 1624316618413 | 3  | "a"
> 1624316618324 | 0  | "a"
> 1624316618418 | 5  | "a"
> 1624316618421 | 6  | "a"
> 1624316618407 | 1  | "a"
> 1624316618426 | 8  | "a"
> 1624316618428 | 9  | "a"
> 1624316618415 | 4  | "a"
> 1624316618423 | 7  | "a"
> 1624316618410 | 2  | "a"
> {noformat}
> But a query like this is not:
> {noformat}
> Executing - query --query="select key,value from /positions.entries where 
> value.id = '0'"
> Result  : false
> Message : Could not create JSON document from PdxInstance
> {noformat}
> It fails with this exception in the server:
> {noformat}
> org.apache.geode.pdx.JSONFormatterException: Could not create JSON document 
> from PdxInstance
>   at 
> org.apache.geode.pdx.JSONFormatter.fromPdxInstance(JSONFormatter.java:241)
>   at org.apache.geode.pdx.JSONFormatter.toJSON(JSONFormatter.java:226)
>   at 
> org.apache.geode.management.internal.cli.domain.DataCommandResult$SelectResultRow.valueToJson(DataCommandResult.java:732)
>   at 
> org.apache.geode.management.internal.cli.domain.DataCommandResult$SelectResultRow.resolveStructToColumns(DataCommandResult.java:717)
>   at 
> 

[jira] [Created] (GEODE-9392) A gfsh query returning a Struct containing a PdxInstance behaves differently than one returning just the PdxInstance in some cases

2021-06-21 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-9392:
--

 Summary: A gfsh query returning a Struct containing a PdxInstance 
behaves differently than one returning just the PdxInstance in some cases
 Key: GEODE-9392
 URL: https://issues.apache.org/jira/browse/GEODE-9392
 Project: Geode
  Issue Type: Bug
  Components: gfsh
Reporter: Barrett Oglesby


This is true when the PdxInstance contains a data type that is not supported by 
PdxToJSON (like Date or Character).

If objects like this are stored as PdxInstances:
{noformat}
public class Position {
  private String id;
  private Date tradeDate;
  private Character type;
  ...
}
{noformat}
A query like this is successful:
{noformat}
Executing - query --query='select * from /positions'

Result : true
Limit  : 100
Rows   : 10

  tradeDate   | id | type
- | -- | 
1624316618413 | 3  | "a"
1624316618324 | 0  | "a"
1624316618418 | 5  | "a"
1624316618421 | 6  | "a"
1624316618407 | 1  | "a"
1624316618426 | 8  | "a"
1624316618428 | 9  | "a"
1624316618415 | 4  | "a"
1624316618423 | 7  | "a"
1624316618410 | 2  | "a"
{noformat}
But a query like this is not:
{noformat}
Executing - query --query="select key,value from /positions.entries where 
value.id = '0'"

Result  : false
Message : Could not create JSON document from PdxInstance
{noformat}
It fails with this exception in the server:
{noformat}
org.apache.geode.pdx.JSONFormatterException: Could not create JSON document 
from PdxInstance
at 
org.apache.geode.pdx.JSONFormatter.fromPdxInstance(JSONFormatter.java:241)
at org.apache.geode.pdx.JSONFormatter.toJSON(JSONFormatter.java:226)
at 
org.apache.geode.management.internal.cli.domain.DataCommandResult$SelectResultRow.valueToJson(DataCommandResult.java:732)
at 
org.apache.geode.management.internal.cli.domain.DataCommandResult$SelectResultRow.resolveStructToColumns(DataCommandResult.java:717)
at 
org.apache.geode.management.internal.cli.domain.DataCommandResult$SelectResultRow.resolveObjectToColumns(DataCommandResult.java:692)
at 
org.apache.geode.management.internal.cli.domain.DataCommandResult$SelectResultRow.createColumnValues(DataCommandResult.java:680)
at 
org.apache.geode.management.internal.cli.domain.DataCommandResult$SelectResultRow.(DataCommandResult.java:663)
at 
org.apache.geode.management.internal.cli.functions.DataCommandFunction.createSelectResultRow(DataCommandFunction.java:270)
at 
org.apache.geode.management.internal.cli.functions.DataCommandFunction.select_SelectResults(DataCommandFunction.java:256)
at 
org.apache.geode.management.internal.cli.functions.DataCommandFunction.select(DataCommandFunction.java:224)
at 
org.apache.geode.management.internal.cli.functions.DataCommandFunction.select(DataCommandFunction.java:177)
at 
org.apache.geode.management.internal.cli.functions.DataCommandFunction.execute(DataCommandFunction.java:126)
Caused by: java.lang.IllegalStateException: PdxInstance returns unknwon 
pdxfield tradeDate for type Mon Jun 21 16:03:38 PDT 2021
at 
org.apache.geode.pdx.internal.json.PdxToJSON.writeValue(PdxToJSON.java:148)
at 
org.apache.geode.pdx.internal.json.PdxToJSON.getJSONString(PdxToJSON.java:185)
at 
org.apache.geode.pdx.internal.json.PdxToJSON.getJSON(PdxToJSON.java:61)
at 
org.apache.geode.pdx.JSONFormatter.fromPdxInstance(JSONFormatter.java:239)
{noformat}
Its because of the difference in processing a PdxInstance (first query) and a 
Struct (second query) in resolveObjectToColumns:
{noformat}
private void resolveObjectToColumns(Map columnData, Object 
value) {
  if (value instanceof PdxInstance) {
resolvePdxToColumns(columnData, (PdxInstance) value);
  } else if (value instanceof Struct) {
resolveStructToColumns(columnData, (StructImpl) value);
  }
  ...
}
{noformat}
They both end up in SelectResultRow.valueToJson:
{noformat}
private String valueToJson(Object value) {
  ...
  if (value instanceof String) {
return (String) value;
  }

  if (value instanceof PdxInstance) {
return JSONFormatter.toJSON((PdxInstance) value);
  }

  ObjectMapper mapper = new ObjectMapper();
  try {
return mapper.writeValueAsString(value);
  } catch (JsonProcessingException jex) {
return jex.getMessage();
  }
}
{noformat}
In the PdxInstance case, the fields are passed in individually and handled by 
the first condition (String) and the ObjectMapper (Date, Character):
{noformat}
SelectResultRow.resolveObjectToColumns value=PDX[13681235,Position]{id=3, 
tradeDate=Mon Jun 21 16:03:38 PDT 2021, type=a}; valueClass=class 
org.apache.geode.pdx.internal.PdxInstanceImpl
SelectResultRow.valueToJson value=Mon Jun 21 16:03:38 PDT 2021; 
valueClass=class java.util.Date
SelectResultRow.valueToJson value=3; valueClass=class java.lang.String
SelectResultRow.valueToJson value=a; valueClass=class 

[jira] [Created] (GEODE-9390) DistributedSystem nodes is counted twice on each server member

2021-06-21 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-9390:
--

 Summary: DistributedSystem nodes is counted twice on each server 
member
 Key: GEODE-9390
 URL: https://issues.apache.org/jira/browse/GEODE-9390
 Project: Geode
  Issue Type: Bug
  Components: membership
Reporter: Barrett Oglesby


Once in ClusterDistributionManager.startThreads:
{noformat}
[warn 2021/06/20 16:20:16.152 HST server-1  tid=0x1] 
ClusterDistributionManager.handleManagerStartup 
id=192.168.1.8(server-1:58386):41001; kind=10

[warn 2021/06/20 16:20:16.153 HST server-1  tid=0x1] 
DistributionStats.incNodes nodes=1
java.lang.Exception
at 
org.apache.geode.distributed.internal.DistributionStats.incNodes(DistributionStats.java:1362)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.handleManagerStartup(ClusterDistributionManager.java:1809)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.addNewMember(ClusterDistributionManager.java:1062)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.startThreads(ClusterDistributionManager.java:691)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.(ClusterDistributionManager.java:504)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.create(ClusterDistributionManager.java:326)
at 
org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:780)
{noformat}
And once in ClusterDistributionManager.create:
{noformat}
[warn 2021/06/20 16:20:16.155 HST server-1  tid=0x1] 
ClusterDistributionManager.handleManagerStartup 
id=192.168.1.8(server-1:58386):41001; kind=10

[warn 2021/06/20 16:20:16.156 HST server-1  tid=0x1] 
DistributionStats.incNodes nodes=2
java.lang.Exception
at 
org.apache.geode.distributed.internal.DistributionStats.incNodes(DistributionStats.java:1362)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.handleManagerStartup(ClusterDistributionManager.java:1809)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.addNewMember(ClusterDistributionManager.java:1062)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.create(ClusterDistributionManager.java:354)
at 
org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:780)
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEODE-9372) DistributionStats needs a stat for create sender time to help diagnose data replication spikes

2021-06-21 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby resolved GEODE-9372.

Fix Version/s: 1.15.0
   Resolution: Fixed

> DistributionStats needs a stat for create sender time to help diagnose data 
> replication spikes
> --
>
> Key: GEODE-9372
> URL: https://issues.apache.org/jira/browse/GEODE-9372
> Project: Geode
>  Issue Type: Improvement
>  Components: statistics
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: GeodeOperationAPI, pull-request-available
> Fix For: 1.15.0
>
> Attachments: 
> PartitionedRegionStats_sendReplicationTime_DistributionStats_sendersTO.gif
>
>
> While debugging an issue with sendReplicationTime, we realized it was all due 
> to sender creation time.
> A statistic for that time would have been very useful.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-9372) DistributionStats needs a stat for create sender time to help diagnose data replication spikes

2021-06-11 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby updated GEODE-9372:
---
Labels: GeodeOperationAPI  (was: )

> DistributionStats needs a stat for create sender time to help diagnose data 
> replication spikes
> --
>
> Key: GEODE-9372
> URL: https://issues.apache.org/jira/browse/GEODE-9372
> Project: Geode
>  Issue Type: Improvement
>  Components: statistics
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: GeodeOperationAPI
> Attachments: 
> PartitionedRegionStats_sendReplicationTime_DistributionStats_sendersTO.gif
>
>
> While debugging an issue with sendReplicationTime, we realized it was all due 
> to sender creation time.
> A statistic for that time would have been very useful.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-9372) DistributionStats needs a stat for create sender time to help diagnose data replication spikes

2021-06-11 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby updated GEODE-9372:
---
Attachment: 
PartitionedRegionStats_sendReplicationTime_DistributionStats_sendersTO.gif

> DistributionStats needs a stat for create sender time to help diagnose data 
> replication spikes
> --
>
> Key: GEODE-9372
> URL: https://issues.apache.org/jira/browse/GEODE-9372
> Project: Geode
>  Issue Type: Improvement
>  Components: statistics
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
> Attachments: 
> PartitionedRegionStats_sendReplicationTime_DistributionStats_sendersTO.gif
>
>
> While debugging an issue with sendReplicationTime, we realized it was all due 
> to sender creation time.
> A statistic for that time would have been very useful.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (GEODE-9372) DistributionStats needs a stat for create sender time to help diagnose data replication spikes

2021-06-11 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-9372:
--

Assignee: Barrett Oglesby

> DistributionStats needs a stat for create sender time to help diagnose data 
> replication spikes
> --
>
> Key: GEODE-9372
> URL: https://issues.apache.org/jira/browse/GEODE-9372
> Project: Geode
>  Issue Type: Improvement
>  Components: statistics
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>
> While debugging an issue with sendReplicationTime, we realized it was all due 
> to sender creation time.
> A statistic for that time would have been very useful.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-9372) DistributionStats needs a stat for create sender time to help diagnose data replication spikes

2021-06-11 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-9372:
--

 Summary: DistributionStats needs a stat for create sender time to 
help diagnose data replication spikes
 Key: GEODE-9372
 URL: https://issues.apache.org/jira/browse/GEODE-9372
 Project: Geode
  Issue Type: Improvement
  Components: statistics
Reporter: Barrett Oglesby


While debugging an issue with sendReplicationTime, we realized it was all due 
to sender creation time.

A statistic for that time would have been very useful.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEODE-8825) CI failure: GatewayReceiverMBeanDUnitTest > testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy

2021-06-10 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby resolved GEODE-8825.

Fix Version/s: 1.15.0
   Resolution: Fixed

> CI failure: GatewayReceiverMBeanDUnitTest > 
> testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy
> 
>
> Key: GEODE-8825
> URL: https://issues.apache.org/jira/browse/GEODE-8825
> Project: Geode
>  Issue Type: Bug
>  Components: tests, wan
>Reporter: Jianxia Chen
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: GeodeOperationAPI, flaky, pull-request-available
> Fix For: 1.15.0
>
>
> {code:java}
> org.apache.geode.internal.cache.wan.GatewayReceiverMBeanDUnitTest > 
> testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.wan.GatewayReceiverMBeanDUnitTest$$Lambda$202/0x0001008f0c40.run
>  in VM 0 running on Host c3e48bdac460 with 4 VMs
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:623)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:447)
> at 
> org.apache.geode.internal.cache.wan.GatewayReceiverMBeanDUnitTest.testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy(GatewayReceiverMBeanDUnitTest.java:76)
> Caused by:
> java.lang.AssertionError: expected null, but was: GemFire:service=GatewayReceiver,type=Member,member=172.17.0.18(183)-41002>
> at org.junit.Assert.fail(Assert.java:89)
> at org.junit.Assert.failNotNull(Assert.java:756)
> at org.junit.Assert.assertNull(Assert.java:738)
> at org.junit.Assert.assertNull(Assert.java:748)
> at 
> org.apache.geode.internal.cache.wan.GatewayReceiverMBeanDUnitTest.verifyMBeanProxiesDoesNotExist(GatewayReceiverMBeanDUnitTest.java:106)
> at 
> org.apache.geode.internal.cache.wan.GatewayReceiverMBeanDUnitTest.lambda$testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy$bb17a952$3(GatewayReceiverMBeanDUnitTest.java:76)
>  {code}
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/704
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.14.0-build.0601/test-results/distributedTest/1610390301/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.14.0-build.0601/test-artifacts/1610390301/distributedtestfiles-OpenJDK11-1.14.0-build.0601.tgz



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (GEODE-8825) CI failure: GatewayReceiverMBeanDUnitTest > testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy

2021-06-09 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-8825:
--

Assignee: Barrett Oglesby

> CI failure: GatewayReceiverMBeanDUnitTest > 
> testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy
> 
>
> Key: GEODE-8825
> URL: https://issues.apache.org/jira/browse/GEODE-8825
> Project: Geode
>  Issue Type: Bug
>  Components: tests, wan
>Reporter: Jianxia Chen
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: GeodeOperationAPI, flaky
>
> {code:java}
> org.apache.geode.internal.cache.wan.GatewayReceiverMBeanDUnitTest > 
> testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.wan.GatewayReceiverMBeanDUnitTest$$Lambda$202/0x0001008f0c40.run
>  in VM 0 running on Host c3e48bdac460 with 4 VMs
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:623)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:447)
> at 
> org.apache.geode.internal.cache.wan.GatewayReceiverMBeanDUnitTest.testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy(GatewayReceiverMBeanDUnitTest.java:76)
> Caused by:
> java.lang.AssertionError: expected null, but was: GemFire:service=GatewayReceiver,type=Member,member=172.17.0.18(183)-41002>
> at org.junit.Assert.fail(Assert.java:89)
> at org.junit.Assert.failNotNull(Assert.java:756)
> at org.junit.Assert.assertNull(Assert.java:738)
> at org.junit.Assert.assertNull(Assert.java:748)
> at 
> org.apache.geode.internal.cache.wan.GatewayReceiverMBeanDUnitTest.verifyMBeanProxiesDoesNotExist(GatewayReceiverMBeanDUnitTest.java:106)
> at 
> org.apache.geode.internal.cache.wan.GatewayReceiverMBeanDUnitTest.lambda$testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy$bb17a952$3(GatewayReceiverMBeanDUnitTest.java:76)
>  {code}
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/704
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.14.0-build.0601/test-results/distributedTest/1610390301/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.14.0-build.0601/test-artifacts/1610390301/distributedtestfiles-OpenJDK11-1.14.0-build.0601.tgz



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8825) CI failure: GatewayReceiverMBeanDUnitTest > testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy

2021-06-09 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby updated GEODE-8825:
---
Labels: GeodeOperationAPI flaky  (was: flaky pull-request-available)

> CI failure: GatewayReceiverMBeanDUnitTest > 
> testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy
> 
>
> Key: GEODE-8825
> URL: https://issues.apache.org/jira/browse/GEODE-8825
> Project: Geode
>  Issue Type: Bug
>  Components: tests, wan
>Reporter: Jianxia Chen
>Priority: Major
>  Labels: GeodeOperationAPI, flaky
>
> {code:java}
> org.apache.geode.internal.cache.wan.GatewayReceiverMBeanDUnitTest > 
> testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.wan.GatewayReceiverMBeanDUnitTest$$Lambda$202/0x0001008f0c40.run
>  in VM 0 running on Host c3e48bdac460 with 4 VMs
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:623)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:447)
> at 
> org.apache.geode.internal.cache.wan.GatewayReceiverMBeanDUnitTest.testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy(GatewayReceiverMBeanDUnitTest.java:76)
> Caused by:
> java.lang.AssertionError: expected null, but was: GemFire:service=GatewayReceiver,type=Member,member=172.17.0.18(183)-41002>
> at org.junit.Assert.fail(Assert.java:89)
> at org.junit.Assert.failNotNull(Assert.java:756)
> at org.junit.Assert.assertNull(Assert.java:738)
> at org.junit.Assert.assertNull(Assert.java:748)
> at 
> org.apache.geode.internal.cache.wan.GatewayReceiverMBeanDUnitTest.verifyMBeanProxiesDoesNotExist(GatewayReceiverMBeanDUnitTest.java:106)
> at 
> org.apache.geode.internal.cache.wan.GatewayReceiverMBeanDUnitTest.lambda$testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy$bb17a952$3(GatewayReceiverMBeanDUnitTest.java:76)
>  {code}
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/704
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.14.0-build.0601/test-results/distributedTest/1610390301/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.14.0-build.0601/test-artifacts/1610390301/distributedtestfiles-OpenJDK11-1.14.0-build.0601.tgz



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8825) CI failure: GatewayReceiverMBeanDUnitTest > testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy

2021-06-08 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359662#comment-17359662
 ] 

Barrett Oglesby commented on GEODE-8825:


Here is some logging that shows the behavior:

Creating the receiver causes it to get added to the added to 
federatedComponentMap:
{noformat}
[vm1] [warn 2021/06/08 16:36:38.288 PDT   
tid=0x13] XXX 
GatewayReceiverMBeanDUnitTest.testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy
 about to create receiver

[vm1] [warn 2021/06/08 16:36:38.370 PDT   
tid=0x13] XXX LocalManager.markForFederation added to federatedComponentMap 
objName=GemFire:service=GatewayReceiver

[vm1] [warn 2021/06/08 16:36:38.376 PDT   
tid=0x13] XXX 
GatewayReceiverMBeanDUnitTest.testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy
 created receiver
{noformat}
There is no logging here, but the Management Task hasn't run when the receiver 
is destroyed. The mbean is removed from federatedComponentMap, but since the 
monitoringRegion doesn't contain the mbean, it doesn't get removed from the 
region. The Management Task adds the mbean to that region (which is how it gets 
to the manager).
{noformat}
[vm1] [warn 2021/06/08 16:36:38.382 PDT   
tid=0x13] XXX 
GatewayReceiverMBeanDUnitTest.testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy
 about to destroy receiver

[vm1] [warn 2021/06/08 16:36:38.384 PDT   
tid=0x13] XXX LocalManager.unMarkForFederation removed from 
federatedComponentMap objName=GemFire:service=GatewayReceiver

[vm1] [warn 2021/06/08 16:36:38.388 PDT   
tid=0x13] XXX LocalManager.unMarkForFederation monitoringRegionContains 
objName=GemFire:service=GatewayReceiver: false

[vm1] [warn 2021/06/08 16:36:38.389 PDT   
tid=0x13] XXX 
GatewayReceiverMBeanDUnitTest.testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy
 destroyed receiver
{noformat}
If I add a sleep between the create and destroy, I see better behavior.

Here is some logging that shows that.

The receiver is created the same as before:
{noformat}
[vm1] [warn 2021/06/08 16:35:40.970 PDT   
tid=0x13] XXX 
GatewayReceiverMBeanDUnitTest.testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy
 about to create receiver

[vm1] [warn 2021/06/08 16:35:41.054 PDT   
tid=0x13] XXX LocalManager.markForFederation added to federatedComponentMap 
objName=GemFire:service=GatewayReceiver,type=Member,member=192.168.1.4(12942)-41002

[vm1] [warn 2021/06/08 16:35:41.061 PDT   
tid=0x13] XXX 
GatewayReceiverMBeanDUnitTest.testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy
 created receiver
{noformat}
The Management Task puts the map into the monitoringRegion adds the proxy:
{noformat}
[vm1] [warn 2021/06/08 16:35:41.775 PDT   tid=0x3e] XXX 
LocalManager.doManagementTask about to put 
replicaMap={GemFire:service=GatewayReceiver = GemFire:service=GatewayReceiver}

[vm0] [warn 2021/06/08 16:35:41.782 PDT  :41002 unshared ordered sender uid=6 dom #1 local 
port=60249 remote port=54707> tid=0x48] XXX MBeanAggregator.afterCreateProxy 
objectName=GemFire:service=GatewayReceiver,type=Member,member=192.168.1.4(12942)-41002
{noformat}
The receiver is destroyed. This time, the monitoringRegion contains the mbean, 
so it is removed from it:
{noformat}
[vm1] [warn 2021/06/08 16:35:44.072 PDT   
tid=0x13] XXX 
GatewayReceiverMBeanDUnitTest.testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy
 about to destroy receiver

[vm1] [warn 2021/06/08 16:35:44.074 PDT   
tid=0x13] XXX ManagementAdapter.handleGatewayReceiverDestroy 
objectName=GemFire:service=GatewayReceiver

[vm1] [warn 2021/06/08 16:35:44.075 PDT   
tid=0x13] XXX LocalManager.unMarkForFederation removed from 
federatedComponentMap objName=GemFire:service=GatewayReceiver

[vm1] [warn 2021/06/08 16:35:44.075 PDT   
tid=0x13] XXX LocalManager.unMarkForFederation monitoringRegionContains 
objName=GemFire:service=GatewayReceiver: true

[vm1] [warn 2021/06/08 16:35:44.079 PDT   
tid=0x13] XXX LocalManager.unMarkForFederation removed from monitoringRegion 
objName=GemFire:service=GatewayReceiver

[vm1] [warn 2021/06/08 16:35:44.079 PDT   
tid=0x13] XXX 
GatewayReceiverMBeanDUnitTest.testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy
 destroyed receiver
{noformat}
The proxy is removed from the mananger:
{noformat}
[vm0] [warn 2021/06/08 16:35:44.082 PDT  :41002 unshared ordered sender uid=4 dom #1 local 
port=60249 remote port=54691> tid=0x41] XXX MBeanAggregator.afterRemoveProxy 
objectName=GemFire:service=GatewayReceiver,type=Member,member=192.168.1.4(12942)-41002
{noformat}
The test needs to be modified to wait for the manager to contain the proxy 
before destroying the receiver.

> CI failure: GatewayReceiverMBeanDUnitTest > 
> testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy
> 
>
> Key: GEODE-8825
> URL: 

[jira] [Commented] (GEODE-8825) CI failure: GatewayReceiverMBeanDUnitTest > testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy

2021-06-08 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359660#comment-17359660
 ] 

Barrett Oglesby commented on GEODE-8825:


This test doesn't really do anything. Its too fast for the 
verifyMBeanProxiesDoesNotExist method to verify anything properly.

For each member, the test does:

- create receiver
- start receiver
- stop receiver
- destroy receiver

Then it verifies in the manager that none of the mbean proxies exist.

The mbean is created when the receiver is created, and destroyed when the 
receiver is destroyed.

The problem is creating the proxy in the manager is asynchronous to creating 
the mbean the local member. There is a Management Task thread that runs (every 
2 seconds) in each member and sends the mbeans to the manager.

So, after the steps above are complete, the mbean hasn't even been sent to the 
manager yet.

Its almost always going to pass except in the case where the Management Task 
runs between the create and the verification. In that case, the proxies will 
exist, and the test will fail.

> CI failure: GatewayReceiverMBeanDUnitTest > 
> testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy
> 
>
> Key: GEODE-8825
> URL: https://issues.apache.org/jira/browse/GEODE-8825
> Project: Geode
>  Issue Type: Bug
>  Components: tests, wan
>Reporter: Jianxia Chen
>Priority: Major
>  Labels: flaky
>
> {code:java}
> org.apache.geode.internal.cache.wan.GatewayReceiverMBeanDUnitTest > 
> testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.wan.GatewayReceiverMBeanDUnitTest$$Lambda$202/0x0001008f0c40.run
>  in VM 0 running on Host c3e48bdac460 with 4 VMs
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:623)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:447)
> at 
> org.apache.geode.internal.cache.wan.GatewayReceiverMBeanDUnitTest.testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy(GatewayReceiverMBeanDUnitTest.java:76)
> Caused by:
> java.lang.AssertionError: expected null, but was: GemFire:service=GatewayReceiver,type=Member,member=172.17.0.18(183)-41002>
> at org.junit.Assert.fail(Assert.java:89)
> at org.junit.Assert.failNotNull(Assert.java:756)
> at org.junit.Assert.assertNull(Assert.java:738)
> at org.junit.Assert.assertNull(Assert.java:748)
> at 
> org.apache.geode.internal.cache.wan.GatewayReceiverMBeanDUnitTest.verifyMBeanProxiesDoesNotExist(GatewayReceiverMBeanDUnitTest.java:106)
> at 
> org.apache.geode.internal.cache.wan.GatewayReceiverMBeanDUnitTest.lambda$testMBeanAndProxiesForGatewayReceiverAreRemovedOnDestroy$bb17a952$3(GatewayReceiverMBeanDUnitTest.java:76)
>  {code}
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/704
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.14.0-build.0601/test-results/distributedTest/1610390301/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.14.0-build.0601/test-artifacts/1610390301/distributedtestfiles-OpenJDK11-1.14.0-build.0601.tgz



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEODE-9299) CI Failure: WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover > testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover

2021-06-01 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby resolved GEODE-9299.

Fix Version/s: 1.15.0
   Resolution: Fixed

> CI Failure: 
> WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover > 
> testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover
> --
>
> Key: GEODE-9299
> URL: https://issues.apache.org/jira/browse/GEODE-9299
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Affects Versions: 1.15.0
>Reporter: Hale Bales
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> {code:java}
> org.apache.geode.cache.wan.WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover
>  > testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from_v1.12.2] 
> FAILED
> java.lang.AssertionError: expected:<100> but was:<101>
> at org.junit.Assert.fail(Assert.java:89)
> at org.junit.Assert.failNotEquals(Assert.java:835)
> at org.junit.Assert.assertEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:633)
> at 
> org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest.stopSenderAndVerifyEvents(WANRollingUpgradeDUnitTest.java:227)
> at 
> org.apache.geode.cache.wan.WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover.testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover(WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover.java:98)
> {code}
> CI Failure: 
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/UpgradeTestOpenJDK11/builds/229#B
> Artifacts Available here: 
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0253/test-results/upgradeTest/1621635640/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-9299) CI Failure: WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover > testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover

2021-05-27 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352894#comment-17352894
 ] 

Barrett Oglesby commented on GEODE-9299:


If I simulate this behavior with a sleep on key=5 in Put65, I see the same 
extra event in the queue.

Keys 0-4 are processed normally in servers 1 and 2:

Server 1:
{noformat}
ServerConnection on port 57561 Thread 1: Put65.cmdExecute processing key=0
ServerConnection on port 57561 Thread 1: 
ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=0; shadowKey=113
ServerConnection on port 57561 Thread 1: Put65.cmdExecute processing key=1
P2P message reader for 10.166.145.22(ln-2:85040):41003 unshared ordered 
uid=8 dom #2 port=57607: ParallelGatewaySenderEventProcessor.enqueueEvent put 
dataKey=1; shadowKey=114
ServerConnection on port 57561 Thread 1: Put65.cmdExecute processing key=2
P2P message reader for 10.166.145.22(ln-2:85040):41003 unshared ordered 
uid=8 dom #2 port=57607: ParallelGatewaySenderEventProcessor.enqueueEvent put 
dataKey=2; shadowKey=115
ServerConnection on port 57561 Thread 1: Put65.cmdExecute processing key=3
ServerConnection on port 57561 Thread 1: 
ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=3; shadowKey=116
ServerConnection on port 57561 Thread 1: Put65.cmdExecute processing key=4
P2P message reader for 10.166.145.22(ln-2:85040):41003 unshared ordered 
uid=8 dom #2 port=57607: ParallelGatewaySenderEventProcessor.enqueueEvent put 
dataKey=4; shadowKey=117
{noformat}
Server 2:
{noformat}
P2P message reader for 10.166.145.22(ln-1:85023):41002 unshared ordered 
uid=8 dom #1 port=57606: ParallelGatewaySenderEventProcessor.enqueueEvent put 
dataKey=0; shadowKey=113
P2P message reader for 10.166.145.22(ln-1:85023):41002 unshared ordered 
uid=8 dom #1 port=57606: ParallelGatewaySenderEventProcessor.enqueueEvent put 
dataKey=1; shadowKey=114
P2P message reader for 10.166.145.22(ln-1:85023):41002 unshared ordered 
uid=8 dom #1 port=57606: ParallelGatewaySenderEventProcessor.enqueueEvent put 
dataKey=2; shadowKey=115
P2P message reader for 10.166.145.22(ln-1:85023):41002 unshared ordered 
uid=8 dom #1 port=57606: ParallelGatewaySenderEventProcessor.enqueueEvent put 
dataKey=3; shadowKey=116
P2P message reader for 10.166.145.22(ln-1:85023):41002 unshared ordered 
uid=8 dom #1 port=57606: ParallelGatewaySenderEventProcessor.enqueueEvent put 
dataKey=4; shadowKey=117
{noformat}
The ServerConnection thread in server 1 sleeps before processing key=5:
{noformat}
ServerConnection on port 57561 Thread 1: Put65.cmdExecute processing key=5
ServerConnection on port 57561 Thread 1: Put65.cmdExecute sleeping key=5
{noformat}
The client times out and fails over to server2 and retries key=5 and continues 
with keys 6-9. Notice the event with key=5 has shadowKey=118. Thats the key in 
the queue.
{noformat}
ServerConnection on port 57587 Thread 2: Put65.cmdExecute processing retried 
key=5
P2P message reader for 10.166.145.22(ln-1:85023):41002 unshared ordered 
uid=10 dom #2 port=57668: ParallelGatewaySenderEventProcessor.enqueueEvent put 
dataKey=5; shadowKey=118
ServerConnection on port 57587 Thread 2: Put65.cmdExecute processing key=6
ServerConnection on port 57587 Thread 2: 
ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=6; shadowKey=119
ServerConnection on port 57587 Thread 2: Put65.cmdExecute processing key=7
P2P message reader for 10.166.145.22(ln-1:85023):41002 unshared ordered 
uid=10 dom #2 port=57668: ParallelGatewaySenderEventProcessor.enqueueEvent put 
dataKey=7; shadowKey=120
ServerConnection on port 57587 Thread 2: Put65.cmdExecute processing key=8
ServerConnection on port 57587 Thread 2: 
ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=8; shadowKey=121
ServerConnection on port 57587 Thread 2: Put65.cmdExecute processing key=9
P2P message reader for 10.166.145.22(ln-1:85023):41002 unshared ordered 
uid=10 dom #2 port=57668: ParallelGatewaySenderEventProcessor.enqueueEvent put 
dataKey=9; shadowKey=122
{noformat}
Server 1 enqueues keys 5-9:
{noformat}
P2P message reader for 10.166.145.22(ln-2:85040):41003 unshared ordered 
uid=10 dom #1 port=57664: ParallelGatewaySenderEventProcessor.enqueueEvent put 
dataKey=5; shadowKey=118
P2P message reader for 10.166.145.22(ln-2:85040):41003 unshared ordered 
uid=10 dom #1 port=57664: ParallelGatewaySenderEventProcessor.enqueueEvent put 
dataKey=6; shadowKey=119
P2P message reader for 10.166.145.22(ln-2:85040):41003 unshared ordered 
uid=10 dom #1 port=57664: ParallelGatewaySenderEventProcessor.enqueueEvent put 
dataKey=7; shadowKey=120
P2P message reader for 10.166.145.22(ln-2:85040):41003 unshared ordered 
uid=10 dom #1 port=57664: ParallelGatewaySenderEventProcessor.enqueueEvent put 
dataKey=8; shadowKey=121
P2P message reader for 10.166.145.22(ln-2:85040):41003 unshared ordered 
uid=10 dom #1 port=57664: ParallelGatewaySenderEventProcessor.enqueueEvent put 

[jira] [Commented] (GEODE-9299) CI Failure: WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover > testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover

2021-05-27 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352893#comment-17352893
 ] 

Barrett Oglesby commented on GEODE-9299:


The failing assertion is verifying the number of entries in the local secondary 
queues is 100 (which matches the number of puts). Instead, it is 101.
{noformat}
int localServer1QueueSize = localServer1.invoke(() -> 
getQueueRegionSize(senderId, false));
int localServer2QueueSize = localServer2.invoke(() -> 
getQueueRegionSize(senderId, false));
assertEquals(numPuts, localServer1QueueSize + localServer2QueueSize);
{noformat}
Here is some logging that shows the behavior in this test.

Client Starts:
{noformat}
[vm3_v1.12.2] [info 2021/05/21 21:12:16.982 GMT   tid=0x22] Received method: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$146/0x0001008afc40.run
 with 0 args on object: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$146/0x0001008afc40@59079e6c
[vm3_v1.12.2] [info 2021/05/21 21:12:17.599 GMT   tid=0x22] Using 
org.apache.geode.logging.log4j.internal.impl.Log4jLoggingProvider from 
ServiceLoader for service org.apache.geode.logging.internal.spi.LoggingProvider
[vm3_v1.12.2] [info 2021/05/21 21:12:24.490 GMT   
tid=0x32] Updating membership port.  Port changed from 0 to 46166.  ID is now 
7e72072330df(13685:loner):0:6094c590
[vm3_v1.12.2] [info 2021/05/21 21:12:24.526 GMT   tid=0x22] Got result: null
[vm3_v1.12.2]  from 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$146/0x0001008afc40.run
 with 0 args on object: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$146/0x0001008afc40@59079e6c
 (took 7538 ms)
{noformat}
Client does 100 puts in 22069ms with a SocketTimeoutException:
{noformat}
[vm3_v1.12.2] [info 2021/05/21 21:12:24.567 GMT   tid=0x22] Received method: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$339/0x000100959840.run
 with 0 args on object: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$339/0x000100959840@2e8f97c1
[vm3_v1.12.2] [warn 2021/05/21 21:12:42.233 GMT   tid=0x22] Pool unexpected socket timed out on client 
connection=Pooled Connection to 7e72072330df:21250: 
Connection[7e72072330df:21250]@93891194)
[vm3_v1.12.2] [info 2021/05/21 21:12:46.638 GMT   tid=0x22] Got result: null
[vm3_v1.12.2]  from 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$339/0x000100959840.run
 with 0 args on object: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$339/0x000100959840@2e8f97c1
 (took 22069 ms)
{noformat}
The SocketTimeoutException means the client retried the put. That ends up being 
2 puts for the same event.

Server 1 returns secondary queue size:
{noformat}
[vm1_v1.12.2] [info 2021/05/21 21:12:46.668 GMT   tid=0x22] Received method: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$520/0x000100ad1040.run
 with 0 args on object: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$520/0x000100ad1040@79d1c376
[vm1_v1.12.2] [info 2021/05/21 21:12:47.598 GMT   tid=0x22] Got result: null
[vm1_v1.12.2]  from 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$520/0x000100ad1040.run
 with 0 args on object: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$520/0x000100ad1040@79d1c376
 (took 929 ms)
{noformat}
Server 2 returns secondary queue size:
{noformat}
[vm2_v1.12.2] [info 2021/05/21 21:12:47.617 GMT   tid=0x22] Received method: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$517/0x000100ae2c40.run
 with 0 args on object: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$517/0x000100ae2c40@751350b6
[vm2_v1.12.2] [info 2021/05/21 21:12:47.782 GMT   tid=0x22] Got result: null
[vm2_v1.12.2]  from 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$517/0x000100ae2c40.run
 with 0 args on object: 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$517/0x000100ae2c40@751350b6
 (took 161 ms)
{noformat}
The assertEquals check fails right after this, and the test shuts down. 

Here is some more detail.

Server 1 buckets are created:
{noformat}
[vm1_v1.12.2] [info 2021/05/21 21:12:24.771 GMT   tid=0x39] Initializing region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_0
[vm1_v1.12.2] [info 2021/05/21 21:12:24.847 GMT   tid=0x39] Initialization of region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_0
 completed
[vm1_v1.12.2] [info 2021/05/21 21:12:25.418 GMT   tid=0x39] Initializing region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_1
[vm1_v1.12.2] [info 2021/05/21 21:12:25.439 GMT   tid=0x39] Initialization of region 
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_1
 completed
[vm1_v1.12.2] [info 2021/05/21 21:12:26.012 

[jira] [Assigned] (GEODE-9299) CI Failure: WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover > testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover

2021-05-27 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-9299:
--

Assignee: Barrett Oglesby

> CI Failure: 
> WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover > 
> testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover
> --
>
> Key: GEODE-9299
> URL: https://issues.apache.org/jira/browse/GEODE-9299
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Affects Versions: 1.15.0
>Reporter: Hale Bales
>Assignee: Barrett Oglesby
>Priority: Major
>
> {code:java}
> org.apache.geode.cache.wan.WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover
>  > testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from_v1.12.2] 
> FAILED
> java.lang.AssertionError: expected:<100> but was:<101>
> at org.junit.Assert.fail(Assert.java:89)
> at org.junit.Assert.failNotEquals(Assert.java:835)
> at org.junit.Assert.assertEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:633)
> at 
> org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest.stopSenderAndVerifyEvents(WANRollingUpgradeDUnitTest.java:227)
> at 
> org.apache.geode.cache.wan.WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover.testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover(WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover.java:98)
> {code}
> CI Failure: 
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/UpgradeTestOpenJDK11/builds/229#B
> Artifacts Available here: 
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0253/test-results/upgradeTest/1621635640/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (GEODE-9307) When a server is force disconnected, its regions can still be referenced

2021-05-24 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-9307:
--

Assignee: Barrett Oglesby

> When a server is force disconnected, its regions can still be referenced
> 
>
> Key: GEODE-9307
> URL: https://issues.apache.org/jira/browse/GEODE-9307
> Project: Geode
>  Issue Type: Bug
>  Components: regions
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>
> When a server is force disconnected, any of its DistributedRegions will not 
> be GCed after they are closed. This is really only a problem if the 
> GemFireCacheImpl is referenced in something other than the 
> ClusterDistributionManager.cache field (in my test, I used a static field of 
> a Function)
> The GemFireCacheImpl references a ClusterDistributionManager in the final 
> field called dm.
> The DistributedRegion creates and references a DistributionAdvisor in the 
> final field called distAdvisor. The DistributionAdvisor creates a 
> MembershipListener and adds it to the ClusterDistributionManager's 
> membershipListeners.
> When the GemFireCacheImpl is closed due to force disconnect, its regions are 
> also closed.
> When a DistributedRegion is closed, its DistributionAdvisor is also closed.
> DistributionAdvisor.close attempts to remove the MembershipListener
> {noformat}
> try {
>   getDistributionManager().removeMembershipListener(membershipListener);
> } catch (CancelException e) {
>   // if distribution has stopped, above is a no-op.
> } ...
> {noformat}
> That call fails with a CancelException, and the MembershipListener is not 
> removed, so the ClusterDistributionManager references both the 
> GemFireCacheImpl and the MembershipListener. The MembershipListener 
> references the DistributionAdvisor which references the DistributedRegion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-9307) When a server is force disconnected, its regions can still be referenced

2021-05-24 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-9307:
--

 Summary: When a server is force disconnected, its regions can 
still be referenced
 Key: GEODE-9307
 URL: https://issues.apache.org/jira/browse/GEODE-9307
 Project: Geode
  Issue Type: Bug
  Components: regions
Reporter: Barrett Oglesby


When a server is force disconnected, any of its DistributedRegions will not be 
GCed after they are closed. This is really only a problem if the 
GemFireCacheImpl is referenced in something other than the 
ClusterDistributionManager.cache field (in my test, I used a static field of a 
Function)

The GemFireCacheImpl references a ClusterDistributionManager in the final field 
called dm.

The DistributedRegion creates and references a DistributionAdvisor in the final 
field called distAdvisor. The DistributionAdvisor creates a MembershipListener 
and adds it to the ClusterDistributionManager's membershipListeners.

When the GemFireCacheImpl is closed due to force disconnect, its regions are 
also closed.

When a DistributedRegion is closed, its DistributionAdvisor is also closed.

DistributionAdvisor.close attempts to remove the MembershipListener
{noformat}
try {
  getDistributionManager().removeMembershipListener(membershipListener);
} catch (CancelException e) {
  // if distribution has stopped, above is a no-op.
} ...
{noformat}
That call fails with a CancelException, and the MembershipListener is not 
removed, so the ClusterDistributionManager references both the GemFireCacheImpl 
and the MembershipListener. The MembershipListener references the 
DistributionAdvisor which references the DistributedRegion.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEODE-9138) Add warning in server logs when data event is ignored as a duplicate

2021-05-05 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby resolved GEODE-9138.

Fix Version/s: 1.15.0
   Resolution: Fixed

> Add warning in server logs when data event is ignored as a duplicate
> 
>
> Key: GEODE-9138
> URL: https://issues.apache.org/jira/browse/GEODE-9138
> Project: Geode
>  Issue Type: Bug
>  Components: client/server, logging
>Reporter: Diane Hardman
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> Under certain rare conditions, a client may send or resend a data event with 
> an eventId that causes the server to interpret it as a duplicate event and 
> discard it.
> It is currently impossible to trace when this happens without extra logging 
> added.
> From Barry:
> No, if the server thinks it has seen the event, it silently eats it. See 
> DistributedEventTracker.hasSeenEvent. It has a trace log message but thats it.
> The log message that tracks this behavior in the server needs to be added 
> permanently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEODE-9104) REST query output displays non-ASCII characters using escapes

2021-05-05 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby resolved GEODE-9104.

Fix Version/s: 1.15.0
   Resolution: Fixed

> REST query output displays non-ASCII characters using escapes
> -
>
> Key: GEODE-9104
> URL: https://issues.apache.org/jira/browse/GEODE-9104
> Project: Geode
>  Issue Type: Bug
>  Components: rest (dev)
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> For example, if JSON containing Chinese characters is put:
> {noformat}
> curl -X PUT -H "Content-Type: application/json" 
> localhost:8081/geode/v1/customers/1 -d '{"id": "1", "firstName": "名", 
> "lastName": "姓"}'
> {noformat}
> The results of getting the entry are correct:
> {noformat}
> curl localhost:8081/geode/v1/customers/1
> {
>   "id" : "1",
>   "firstName" : "名",
>   "lastName" : "姓"
> }
> {noformat}
> The results of querying the entry show the field values escaped:
> {noformat}
> curl -G http://localhost:8081/gemfire-api/v1/queries/adhoc --data-urlencode 
> "q=SELECT * FROM /customers where id='1'"
> [ {
>   "id" : "1",
>   "firstName" : "\u540D",
>   "lastName" : "\u59D3"
> } ]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-9138) Add warning in server logs when data event is ignored as a duplicate

2021-04-29 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335733#comment-17335733
 ] 

Barrett Oglesby commented on GEODE-9138:


A lot of this change is to determine when and when not to log the message.

There were 3 HA cases where the message was logged validly:

- posDup (for client message retries)
- low bucket redundancy (when a server crashes)
- recoveries in progress (when a server restarts)

The first change was to convert the message from debug to info. What that 
change, Lynn ran the parReg/parRegHABridge.bt test to see how many of those 
message were logged. This test does a variety of operations while killing and 
restarting servers.

There were ~400 of those messages logged per test the first run. These are all 
valid duplicates that we don't want to log if we can help it. We really only 
want to log this message in steady state (no HA).

The first case I noticed was low bucket redundancy. I also noticed at that same 
time sometimes posDup was true; sometimes not. I also realized every message in 
this low bucket redundancy state should have been posDup (they were all client 
retries). But posDup wasn't set on putAlls and removeAlls. So I made those 
changes and added the posDup case.

After that, there were still a handful of messages logged. That was because the 
region was being recovered. The messages were logged for a specific bucket 
right after it was GIIed, but the region was still in recovery, so I added that 
case.

I also ran some rebalance tests, but I didn't see any messages. I'm not 100% 
sure there aren't any though.

> Add warning in server logs when data event is ignored as a duplicate
> 
>
> Key: GEODE-9138
> URL: https://issues.apache.org/jira/browse/GEODE-9138
> Project: Geode
>  Issue Type: Bug
>  Components: client/server, logging
>Reporter: Diane Hardman
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: pull-request-available
>
> Under certain rare conditions, a client may send or resend a data event with 
> an eventId that causes the server to interpret it as a duplicate event and 
> discard it.
> It is currently impossible to trace when this happens without extra logging 
> added.
> From Barry:
> No, if the server thinks it has seen the event, it silently eats it. See 
> DistributedEventTracker.hasSeenEvent. It has a trace log message but thats it.
> The log message that tracks this behavior in the server needs to be added 
> permanently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (GEODE-9138) Add warning in server logs when data event is ignored as a duplicate

2021-04-29 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-9138:
--

Assignee: Barrett Oglesby

> Add warning in server logs when data event is ignored as a duplicate
> 
>
> Key: GEODE-9138
> URL: https://issues.apache.org/jira/browse/GEODE-9138
> Project: Geode
>  Issue Type: Bug
>  Components: client/server, logging
>Reporter: Diane Hardman
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: pull-request-available
>
> Under certain rare conditions, a client may send or resend a data event with 
> an eventId that causes the server to interpret it as a duplicate event and 
> discard it.
> It is currently impossible to trace when this happens without extra logging 
> added.
> From Barry:
> No, if the server thinks it has seen the event, it silently eats it. See 
> DistributedEventTracker.hasSeenEvent. It has a trace log message but thats it.
> The log message that tracks this behavior in the server needs to be added 
> permanently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-9174) The result of a gfsh query containing a UUID may not be displayed properly

2021-04-19 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby updated GEODE-9174:
---
Summary: The result of a gfsh query containing a UUID may not be displayed 
properly  (was: A gfsh query with a UUID in the result may not be displayed 
properly)

> The result of a gfsh query containing a UUID may not be displayed properly
> --
>
> Key: GEODE-9174
> URL: https://issues.apache.org/jira/browse/GEODE-9174
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh, querying
>Reporter: Barrett Oglesby
>Priority: Major
>
> For example, if the key is a UUID, then a query like this won't show the 
> results even though there is one:
> {noformat}
> gfsh>query --query="select key from /data.entries where value.id = 
> '55e907b6-a1fe-42ea-90a2-6a5698e9b27c'"
> Result : true
> Limit  : 100
> Rows   : 1
> {noformat}
> But a query like this will:
> {noformat}
> gfsh>query --query="select key,value from /data.entries where value.id = 
> '55e907b6-a1fe-42ea-90a2-6a5698e9b27c'"
> Result : true
> Limit  : 100
> Rows   : 1
>  key   | value
> -- | 
> ---
> "55e907b6-a1fe-42ea-90a2-6a5698e9b27c" | 
> {"id":"55e907b6-a1fe-42ea-90a2-6a5698e9b27c","cusip":"AAPL","shares":22,"price":352.32}
> {noformat}
> Thats because of the way {{DataCommandResult.resolveObjectToColumns}} works.
> {noformat}
> private void resolveObjectToColumns(Map columnData, Object 
> value) {
>   if (value instanceof PdxInstance) {
> resolvePdxToColumns(columnData, (PdxInstance) value);
>   } else if (value instanceof Struct) {
> resolveStructToColumns(columnData, (StructImpl) value);
>   } else {
> ObjectMapper mapper = new ObjectMapper();
> JsonNode node = mapper.valueToTree(value);
> node.fieldNames().forEachRemaining(field -> {
>   ...
>   columnData.put(field, mapper.writeValueAsString(node.get(field)));
> });
>   }
> }
> {noformat}
> The value in the first query is a {{UUID}} so the last else clause is 
> invoked. In this case, a {{JsonNode}} is used to determine the columns. 
> {{ObjectMapper.valueToTree}} converts a {{UUID}} to a {{TextNode}}. 
> {{TextNodes}} have no fieldNames, and {{JsonNode.fieldNames}} returns an 
> {{EmptyIterator}} by default:
> {noformat}
> public Iterator fieldNames() {
>   return ClassUtil.emptyIterator();
> }
> {noformat}
> So, {{resolveObjectToColumns}} doesn't fill in columnData, which causes the 
> {{DataCommandResult.buildTable}} in the locator to not add any rows to the 
> table.
> The value in the second query is a {{Struct}} so the second else clause is 
> invoked. The {{resolveStructToColumns}} method does:
> {noformat}
> private void resolveStructToColumns(Map columnData, 
> StructImpl struct) {
>   for (String field : struct.getFieldNames()) {
> columnData.put(field, valueToJson(struct.get(field)));
>   }
> }
> {noformat}
> I'm not sure if there is a way to make {{ObjectMapper.valueToTree}} handle 
> {{UUIDs}} differently, but they can easily be special-cased like 
> {{PdxInstances}} and {{Structs}}:
> {noformat}
> } else if (value instanceof UUID) {
>   columnData.put("uuid", valueToJson(value));
> {noformat}
> I'm not sure if this is the best solution, but it works. With this clause 
> added, the query does:
> {noformat}
> gfsh>query --query="select key from /data.entries where value.id = 
> '55e907b6-a1fe-42ea-90a2-6a5698e9b27c'"
> Result : true
> Limit  : 100
> Rows   : 1
> uuid
> --
> "55e907b6-a1fe-42ea-90a2-6a5698e9b27c"
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-9174) A gfsh query with a UUID in the result may not be displayed properly

2021-04-19 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-9174:
--

 Summary: A gfsh query with a UUID in the result may not be 
displayed properly
 Key: GEODE-9174
 URL: https://issues.apache.org/jira/browse/GEODE-9174
 Project: Geode
  Issue Type: Bug
  Components: gfsh, querying
Reporter: Barrett Oglesby


For example, if the key is a UUID, then a query like this won't show the 
results even though there is one:
{noformat}
gfsh>query --query="select key from /data.entries where value.id = 
'55e907b6-a1fe-42ea-90a2-6a5698e9b27c'"
Result : true
Limit  : 100
Rows   : 1
{noformat}
But a query like this will:
{noformat}
gfsh>query --query="select key,value from /data.entries where value.id = 
'55e907b6-a1fe-42ea-90a2-6a5698e9b27c'"
Result : true
Limit  : 100
Rows   : 1

 key   | value
-- | 
---
"55e907b6-a1fe-42ea-90a2-6a5698e9b27c" | 
{"id":"55e907b6-a1fe-42ea-90a2-6a5698e9b27c","cusip":"AAPL","shares":22,"price":352.32}
{noformat}
Thats because of the way {{DataCommandResult.resolveObjectToColumns}} works.
{noformat}
private void resolveObjectToColumns(Map columnData, Object 
value) {
  if (value instanceof PdxInstance) {
resolvePdxToColumns(columnData, (PdxInstance) value);
  } else if (value instanceof Struct) {
resolveStructToColumns(columnData, (StructImpl) value);
  } else {
ObjectMapper mapper = new ObjectMapper();
JsonNode node = mapper.valueToTree(value);
node.fieldNames().forEachRemaining(field -> {
  ...
  columnData.put(field, mapper.writeValueAsString(node.get(field)));
});
  }
}
{noformat}
The value in the first query is a {{UUID}} so the last else clause is invoked. 
In this case, a {{JsonNode}} is used to determine the columns. 
{{ObjectMapper.valueToTree}} converts a {{UUID}} to a {{TextNode}}. 
{{TextNodes}} have no fieldNames, and {{JsonNode.fieldNames}} returns an 
{{EmptyIterator}} by default:
{noformat}
public Iterator fieldNames() {
  return ClassUtil.emptyIterator();
}
{noformat}
So, {{resolveObjectToColumns}} doesn't fill in columnData, which causes the 
{{DataCommandResult.buildTable}} in the locator to not add any rows to the 
table.

The value in the second query is a {{Struct}} so the second else clause is 
invoked. The {{resolveStructToColumns}} method does:
{noformat}
private void resolveStructToColumns(Map columnData, StructImpl 
struct) {
  for (String field : struct.getFieldNames()) {
columnData.put(field, valueToJson(struct.get(field)));
  }
}
{noformat}
I'm not sure if there is a way to make {{ObjectMapper.valueToTree}} handle 
{{UUIDs}} differently, but they can easily be special-cased like 
{{PdxInstances}} and {{Structs}}:
{noformat}
} else if (value instanceof UUID) {
  columnData.put("uuid", valueToJson(value));
{noformat}
I'm not sure if this is the best solution, but it works. With this clause 
added, the query does:
{noformat}
gfsh>query --query="select key from /data.entries where value.id = 
'55e907b6-a1fe-42ea-90a2-6a5698e9b27c'"
Result : true
Limit  : 100
Rows   : 1

uuid
--
"55e907b6-a1fe-42ea-90a2-6a5698e9b27c"
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (GEODE-9122) Setting group-transaction-events=true can cause ConcurrentModificationExceptions

2021-04-06 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-9122:
--

Assignee: Alberto Gomez

> Setting group-transaction-events=true can cause 
> ConcurrentModificationExceptions
> 
>
> Key: GEODE-9122
> URL: https://issues.apache.org/jira/browse/GEODE-9122
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Barrett Oglesby
>Assignee: Alberto Gomez
>Priority: Major
>
> The 
> SerialWANStatsDUnitTest.testReplicatedSerialPropagationHAWithGroupTransactionEvents
>  test can throw a ConcurrentModificationException like:
> {noformat}
> [warn 2021/04/04 02:55:53.253 GMT   
> tid=0x15d] An Exception occurred. The dispatcher will continue.
> java.util.ConcurrentModificationException
>   at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445)
>   at java.util.HashMap$KeyIterator.next(HashMap.java:1469)
>   at 
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peekEventsFromIncompleteTransactions(SerialGatewaySenderQueue.java:476)
>   at 
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek(SerialGatewaySenderQueue.java:453)
>   at 
> org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue(AbstractGatewaySenderEventProcessor.java:518)
>   at 
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run(SerialGatewaySenderEventProcessor.java:223)
> {noformat}
> If the SerialGatewaySenderQueue.peekEventsFromIncompleteTransactions contains 
> more than one TransactionId, and one of them is removed, the 
> ConcurrentModificationException will occur.
> Both the SerialGatewaySenderQueue and ParallelGatewaySenderQueue 
> peekEventsFromIncompleteTransactions have the same implementation.
> These methods do:
> {noformat}
>while (true) {
> 1. ->for (TransactionId transactionId : incompleteTransactionIdsInBatch) {
>...
>if (...) {
>   ...
> 2. -> incompleteTransactionIdsInBatch.remove(transactionId);
>}
>  }
>}
> {noformat}
> The for-each loop (1) cannot be paired with the remove from the 
> incompleteTransactionIdsInBatch set (2). As soon as the remove is called, the 
> ConcurrentModificationException will be thrown the next time through the 
> loop. Since this for loop is in a while (true) loop, it is an infinite loop.
> One way to address this would be to use an Iterator and call remove on the 
> Iterator like:
> {noformat}
> 1. ->for (Iterator i = 
> incompleteTransactionIdsInBatch.iterator(); i.hasNext();) {
>TransactionId transactionId = i.next();
>...
> 2. -> i.remove();
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-9122) Setting group-transaction-events=true can cause ConcurrentModificationExceptions

2021-04-06 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-9122:
--

 Summary: Setting group-transaction-events=true can cause 
ConcurrentModificationExceptions
 Key: GEODE-9122
 URL: https://issues.apache.org/jira/browse/GEODE-9122
 Project: Geode
  Issue Type: Bug
  Components: wan
Reporter: Barrett Oglesby


The 
SerialWANStatsDUnitTest.testReplicatedSerialPropagationHAWithGroupTransactionEvents
 test can throw a ConcurrentModificationException like:
{noformat}
[warn 2021/04/04 02:55:53.253 GMT   
tid=0x15d] An Exception occurred. The dispatcher will continue.
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445)
at java.util.HashMap$KeyIterator.next(HashMap.java:1469)
at 
org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peekEventsFromIncompleteTransactions(SerialGatewaySenderQueue.java:476)
at 
org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek(SerialGatewaySenderQueue.java:453)
at 
org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue(AbstractGatewaySenderEventProcessor.java:518)
at 
org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run(SerialGatewaySenderEventProcessor.java:223)
{noformat}
If the SerialGatewaySenderQueue.peekEventsFromIncompleteTransactions contains 
more than one TransactionId, and one of them is removed, the 
ConcurrentModificationException will occur.

Both the SerialGatewaySenderQueue and ParallelGatewaySenderQueue 
peekEventsFromIncompleteTransactions have the same implementation.

These methods do:
{noformat}
   while (true) {
1. ->for (TransactionId transactionId : incompleteTransactionIdsInBatch) {
   ...
   if (...) {
  ...
2. -> incompleteTransactionIdsInBatch.remove(transactionId);
   }
 }
   }
{noformat}
The for-each loop (1) cannot be paired with the remove from the 
incompleteTransactionIdsInBatch set (2). As soon as the remove is called, the 
ConcurrentModificationException will be thrown the next time through the loop. 
Since this for loop is in a while (true) loop, it is an infinite loop.

One way to address this would be to use an Iterator and call remove on the 
Iterator like:
{noformat}
1. ->for (Iterator i = 
incompleteTransactionIdsInBatch.iterator(); i.hasNext();) {
   TransactionId transactionId = i.next();
   ...
2. -> i.remove();
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-9104) REST query output displays non-ASCII characters using escapes

2021-03-30 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17311707#comment-17311707
 ] 

Barrett Oglesby commented on GEODE-9104:


The query executes this code path:
{noformat}
java.lang.Exception
at 
org.apache.geode.rest.internal.web.util.JSONUtils.enableDisableJSONGeneratorFeature(JSONUtils.java:57)
at 
org.apache.geode.rest.internal.web.util.JSONUtils.convertCollectionToJson(JSONUtils.java:141)
at 
org.apache.geode.rest.internal.web.controllers.AbstractBaseController.processQueryResponse(AbstractBaseController.java:243)
at 
org.apache.geode.rest.internal.web.controllers.QueryAccessController.runNamedQuery(QueryAccessController.java:262)
{noformat}
JSONUtils creates a JsonGenerator like:
{noformat}
getObjectMapper().getFactory().createGenerator((OutputStream) outputStream, 
JsonEncoding.UTF8)
{noformat}
It then enables the ESCAPE_NON_ASCII feature:
{noformat}
generator.enable(JsonWriteFeature.ESCAPE_NON_ASCII.mappedFeature());
{noformat}
This is what causes the Chinese characters to be escaped.

The get creates a RegionData in this code path:
{noformat}
java.lang.Exception: RegionData.RegionData
at 
org.apache.geode.rest.internal.web.controllers.support.RegionData.(RegionData.java:59)
at 
org.apache.geode.rest.internal.web.controllers.support.RegionEntryData.(RegionEntryData.java:48)
at 
org.apache.geode.rest.internal.web.controllers.PdxBasedCrudController.getRegionKeys(PdxBasedCrudController.java:260)
at 
org.apache.geode.rest.internal.web.controllers.PdxBasedCrudController.read(PdxBasedCrudController.java:243)
{noformat}
The RegionData is serialized here:
{noformat}
java.lang.Exception: RegionData.serialize
at 
org.apache.geode.rest.internal.web.controllers.support.RegionData.serialize(RegionData.java:131)
at 
com.fasterxml.jackson.databind.ser.std.SerializableSerializer.serialize(SerializableSerializer.java:39)
at 
com.fasterxml.jackson.databind.ser.std.SerializableSerializer.serialize(SerializableSerializer.java:20)
at 
com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._serialize(DefaultSerializerProvider.java:480)
at 
com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:319)
at 
com.fasterxml.jackson.databind.ObjectWriter$Prefetch.serialize(ObjectWriter.java:1514)
at 
com.fasterxml.jackson.databind.ObjectWriter.writeValue(ObjectWriter.java:1006)
at 
org.springframework.http.converter.json.AbstractJackson2HttpMessageConverter.writeInternal(AbstractJackson2HttpMessageConverter.java:454)
at 
org.springframework.http.converter.AbstractGenericHttpMessageConverter.write(AbstractGenericHttpMessageConverter.java:104)
{noformat}
AbstractJackson2HttpMessageConverter.writeInternal creates a JsonGenerator like 
this:
{noformat}
objectMapper.getFactory().createGenerator(outputStream, encoding)
{noformat}
This is the same as JSONUtils. The ESCAPE_NON_ASCII is not enabled in this 
case, though.


> REST query output displays non-ASCII characters using escapes
> -
>
> Key: GEODE-9104
> URL: https://issues.apache.org/jira/browse/GEODE-9104
> Project: Geode
>  Issue Type: Bug
>  Components: rest (dev)
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>
> For example, if JSON containing Chinese characters is put:
> {noformat}
> curl -X PUT -H "Content-Type: application/json" 
> localhost:8081/geode/v1/customers/1 -d '{"id": "1", "firstName": "名", 
> "lastName": "姓"}'
> {noformat}
> The results of getting the entry are correct:
> {noformat}
> curl localhost:8081/geode/v1/customers/1
> {
>   "id" : "1",
>   "firstName" : "名",
>   "lastName" : "姓"
> }
> {noformat}
> The results of querying the entry show the field values escaped:
> {noformat}
> curl -G http://localhost:8081/gemfire-api/v1/queries/adhoc --data-urlencode 
> "q=SELECT * FROM /customers where id='1'"
> [ {
>   "id" : "1",
>   "firstName" : "\u540D",
>   "lastName" : "\u59D3"
> } ]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (GEODE-9104) REST query output displays non-ASCII characters using escapes

2021-03-30 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-9104:
--

Assignee: Barrett Oglesby

> REST query output displays non-ASCII characters using escapes
> -
>
> Key: GEODE-9104
> URL: https://issues.apache.org/jira/browse/GEODE-9104
> Project: Geode
>  Issue Type: Bug
>  Components: rest (dev)
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>
> For example, if JSON containing Chinese characters is put:
> {noformat}
> curl -X PUT -H "Content-Type: application/json" 
> localhost:8081/geode/v1/customers/1 -d '{"id": "1", "firstName": "名", 
> "lastName": "姓"}'
> {noformat}
> The results of getting the entry are correct:
> {noformat}
> curl localhost:8081/geode/v1/customers/1
> {
>   "id" : "1",
>   "firstName" : "名",
>   "lastName" : "姓"
> }
> {noformat}
> The results of querying the entry show the field values escaped:
> {noformat}
> curl -G http://localhost:8081/gemfire-api/v1/queries/adhoc --data-urlencode 
> "q=SELECT * FROM /customers where id='1'"
> [ {
>   "id" : "1",
>   "firstName" : "\u540D",
>   "lastName" : "\u59D3"
> } ]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-9104) REST query output displays non-ASCII characters using escapes

2021-03-30 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-9104:
--

 Summary: REST query output displays non-ASCII characters using 
escapes
 Key: GEODE-9104
 URL: https://issues.apache.org/jira/browse/GEODE-9104
 Project: Geode
  Issue Type: Bug
  Components: rest (dev)
Reporter: Barrett Oglesby


For example, if JSON containing Chinese characters is put:
{noformat}
curl -X PUT -H "Content-Type: application/json" 
localhost:8081/geode/v1/customers/1 -d '{"id": "1", "firstName": "名", 
"lastName": "姓"}'
{noformat}
The results of getting the entry are correct:
{noformat}
curl localhost:8081/geode/v1/customers/1
{
  "id" : "1",
  "firstName" : "名",
  "lastName" : "姓"
}
{noformat}
The results of querying the entry show the field values escaped:
{noformat}
curl -G http://localhost:8081/gemfire-api/v1/queries/adhoc --data-urlencode 
"q=SELECT * FROM /customers where id='1'"
[ {
  "id" : "1",
  "firstName" : "\u540D",
  "lastName" : "\u59D3"
} ]
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-9030) The PartitionedIndex arbitraryBucketIndex doesn't get reset when the BucketRegion defining it is moved

2021-03-24 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby updated GEODE-9030:
---
Labels: blocks-1.14.0​ pull-request-available  (was: pull-request-available)

> The PartitionedIndex arbitraryBucketIndex doesn't get reset when the 
> BucketRegion defining it is moved
> --
>
> Key: GEODE-9030
> URL: https://issues.apache.org/jira/browse/GEODE-9030
> Project: Geode
>  Issue Type: Bug
>  Components: querying
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: blocks-1.14.0​, pull-request-available
>
> This causes a RegionDestroyedException like this when executing a query 
> containing a != clause:
> {noformat}
> Exception in thread "main" 
> org.apache.geode.cache.client.ServerOperationException: remote server on 
> 10.166.145.16(client:27461:loner):58776:dfd3ba27:client: While performing a 
> remote query
>   at 
> org.apache.geode.cache.client.internal.AbstractOp.processChunkedResponse(AbstractOp.java:342)
>   at 
> org.apache.geode.cache.client.internal.QueryOp$QueryOpImpl.processResponse(QueryOp.java:168)
>   at 
> org.apache.geode.cache.client.internal.AbstractOp.processResponse(AbstractOp.java:224)
>   at 
> org.apache.geode.cache.client.internal.AbstractOp.attemptReadResponse(AbstractOp.java:197)
>   at 
> org.apache.geode.cache.client.internal.AbstractOp.attempt(AbstractOp.java:384)
>   at 
> org.apache.geode.cache.client.internal.ConnectionImpl.execute(ConnectionImpl.java:284)
>   at 
> org.apache.geode.cache.client.internal.pooling.PooledConnection.execute(PooledConnection.java:355)
>   at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.executeWithPossibleReAuthentication(OpExecutorImpl.java:756)
>   at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:142)
>   at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:112)
>   at 
> org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:797)
>   at 
> org.apache.geode.cache.client.internal.QueryOp.execute(QueryOp.java:59)
>   at 
> org.apache.geode.cache.client.internal.ServerProxy.query(ServerProxy.java:59)
>   at 
> org.apache.geode.cache.query.internal.DefaultQuery.executeOnServer(DefaultQuery.java:327)
>   at 
> org.apache.geode.cache.query.internal.DefaultQuery.execute(DefaultQuery.java:215)
>   at 
> org.apache.geode.cache.query.internal.DefaultQuery.execute(DefaultQuery.java:197)
> Caused by: org.apache.geode.cache.query.QueryInvocationTargetException: The 
> Region on which query is executed may have been 
> destroyed.BucketRegion[path='/__PR/_B__trade_0;serial=12;primary=false]
>   at 
> org.apache.geode.internal.cache.PRQueryProcessor.executeQueryOnBuckets(PRQueryProcessor.java:264)
>   at 
> org.apache.geode.internal.cache.PRQueryProcessor.executeSequentially(PRQueryProcessor.java:214)
>   at 
> org.apache.geode.internal.cache.PRQueryProcessor.executeQuery(PRQueryProcessor.java:124)
>   at 
> org.apache.geode.internal.cache.partitioned.QueryMessage.operateOnPartitionedRegion(QueryMessage.java:210)
> Caused by: org.apache.geode.cache.RegionDestroyedException: 
> BucketRegion[path='/__PR/_B__trade_0;serial=12;primary=false]
>   at 
> org.apache.geode.internal.cache.LocalRegion.checkRegionDestroyed(LocalRegion.java:7352)
>   at 
> org.apache.geode.internal.cache.LocalRegion.checkReadiness(LocalRegion.java:2757)
>   at 
> org.apache.geode.internal.cache.BucketRegion.checkReadiness(BucketRegion.java:1437)
>   at 
> org.apache.geode.internal.cache.LocalRegion.size(LocalRegion.java:8313)
>   at 
> org.apache.geode.cache.query.internal.index.CompactRangeIndex.getSizeEstimate(CompactRangeIndex.java:331)
>   at 
> org.apache.geode.cache.query.internal.CompiledComparison.getSizeEstimate(CompiledComparison.java:337)
>   at 
> org.apache.geode.cache.query.internal.GroupJunction.organizeOperands(GroupJunction.java:146)
>   at 
> org.apache.geode.cache.query.internal.AbstractGroupOrRangeJunction.filterEvaluate(AbstractGroupOrRangeJunction.java:148)
>   at 
> org.apache.geode.cache.query.internal.CompiledJunction.filterEvaluate(CompiledJunction.java:190)
>   at 
> org.apache.geode.cache.query.internal.CompiledSelect.evaluate(CompiledSelect.java:538)
>   at 
> org.apache.geode.cache.query.internal.CompiledSelect.evaluate(CompiledSelect.java:53)
>   at 
> org.apache.geode.cache.query.internal.DefaultQuery.executeUsingContext(DefaultQuery.java:357)
>   at 
> org.apache.geode.internal.cache.PRQueryProcessor.executeQueryOnBuckets(PRQueryProcessor.java:248)
> {noformat}
> Here is an 

[jira] [Assigned] (GEODE-9030) The PartitionedIndex arbitraryBucketIndex doesn't get reset when the BucketRegion defining it is moved

2021-03-22 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-9030:
--

Assignee: Barrett Oglesby

> The PartitionedIndex arbitraryBucketIndex doesn't get reset when the 
> BucketRegion defining it is moved
> --
>
> Key: GEODE-9030
> URL: https://issues.apache.org/jira/browse/GEODE-9030
> Project: Geode
>  Issue Type: Bug
>  Components: querying
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: pull-request-available
>
> This causes a RegionDestroyedException like this when executing a query 
> containing a != clause:
> {noformat}
> Exception in thread "main" 
> org.apache.geode.cache.client.ServerOperationException: remote server on 
> 10.166.145.16(client:27461:loner):58776:dfd3ba27:client: While performing a 
> remote query
>   at 
> org.apache.geode.cache.client.internal.AbstractOp.processChunkedResponse(AbstractOp.java:342)
>   at 
> org.apache.geode.cache.client.internal.QueryOp$QueryOpImpl.processResponse(QueryOp.java:168)
>   at 
> org.apache.geode.cache.client.internal.AbstractOp.processResponse(AbstractOp.java:224)
>   at 
> org.apache.geode.cache.client.internal.AbstractOp.attemptReadResponse(AbstractOp.java:197)
>   at 
> org.apache.geode.cache.client.internal.AbstractOp.attempt(AbstractOp.java:384)
>   at 
> org.apache.geode.cache.client.internal.ConnectionImpl.execute(ConnectionImpl.java:284)
>   at 
> org.apache.geode.cache.client.internal.pooling.PooledConnection.execute(PooledConnection.java:355)
>   at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.executeWithPossibleReAuthentication(OpExecutorImpl.java:756)
>   at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:142)
>   at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:112)
>   at 
> org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:797)
>   at 
> org.apache.geode.cache.client.internal.QueryOp.execute(QueryOp.java:59)
>   at 
> org.apache.geode.cache.client.internal.ServerProxy.query(ServerProxy.java:59)
>   at 
> org.apache.geode.cache.query.internal.DefaultQuery.executeOnServer(DefaultQuery.java:327)
>   at 
> org.apache.geode.cache.query.internal.DefaultQuery.execute(DefaultQuery.java:215)
>   at 
> org.apache.geode.cache.query.internal.DefaultQuery.execute(DefaultQuery.java:197)
> Caused by: org.apache.geode.cache.query.QueryInvocationTargetException: The 
> Region on which query is executed may have been 
> destroyed.BucketRegion[path='/__PR/_B__trade_0;serial=12;primary=false]
>   at 
> org.apache.geode.internal.cache.PRQueryProcessor.executeQueryOnBuckets(PRQueryProcessor.java:264)
>   at 
> org.apache.geode.internal.cache.PRQueryProcessor.executeSequentially(PRQueryProcessor.java:214)
>   at 
> org.apache.geode.internal.cache.PRQueryProcessor.executeQuery(PRQueryProcessor.java:124)
>   at 
> org.apache.geode.internal.cache.partitioned.QueryMessage.operateOnPartitionedRegion(QueryMessage.java:210)
> Caused by: org.apache.geode.cache.RegionDestroyedException: 
> BucketRegion[path='/__PR/_B__trade_0;serial=12;primary=false]
>   at 
> org.apache.geode.internal.cache.LocalRegion.checkRegionDestroyed(LocalRegion.java:7352)
>   at 
> org.apache.geode.internal.cache.LocalRegion.checkReadiness(LocalRegion.java:2757)
>   at 
> org.apache.geode.internal.cache.BucketRegion.checkReadiness(BucketRegion.java:1437)
>   at 
> org.apache.geode.internal.cache.LocalRegion.size(LocalRegion.java:8313)
>   at 
> org.apache.geode.cache.query.internal.index.CompactRangeIndex.getSizeEstimate(CompactRangeIndex.java:331)
>   at 
> org.apache.geode.cache.query.internal.CompiledComparison.getSizeEstimate(CompiledComparison.java:337)
>   at 
> org.apache.geode.cache.query.internal.GroupJunction.organizeOperands(GroupJunction.java:146)
>   at 
> org.apache.geode.cache.query.internal.AbstractGroupOrRangeJunction.filterEvaluate(AbstractGroupOrRangeJunction.java:148)
>   at 
> org.apache.geode.cache.query.internal.CompiledJunction.filterEvaluate(CompiledJunction.java:190)
>   at 
> org.apache.geode.cache.query.internal.CompiledSelect.evaluate(CompiledSelect.java:538)
>   at 
> org.apache.geode.cache.query.internal.CompiledSelect.evaluate(CompiledSelect.java:53)
>   at 
> org.apache.geode.cache.query.internal.DefaultQuery.executeUsingContext(DefaultQuery.java:357)
>   at 
> org.apache.geode.internal.cache.PRQueryProcessor.executeQueryOnBuckets(PRQueryProcessor.java:248)
> {noformat}
> Here is an example query that fails:
> {noformat}
> SELECT * FROM /trade 

[jira] [Resolved] (GEODE-9040) The SingleThreadColocationLogger executorService is not shutdown when the server is stopped

2021-03-22 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby resolved GEODE-9040.

Fix Version/s: 1.14.0
   Resolution: Fixed

> The SingleThreadColocationLogger executorService is not shutdown when the 
> server is stopped
> ---
>
> Key: GEODE-9040
> URL: https://issues.apache.org/jira/browse/GEODE-9040
> Project: Geode
>  Issue Type: Bug
>  Components: logging
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> When a server is shutdown, its JVM remains alive because the ExecutorService 
> created by the SingleThreadColocationLogger is not terminated nor is its 
> thread a daemon:
> {noformat}
> "ColocationLogger for customer" #57 prio=5 os_prio=31 tid=0x7fb39d4e4000 
> nid=0xb203 waiting on condition [0x7dc58000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x000785268818> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The SingleThreadColocationLogger only gets created when there are missing 
> co-located regions.
> We can either terminate the ExecutorService or make its thread a daemon or 
> both.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-9043) A register interest attempt from a newer client to an older server throws a NoSubscriptionServersAvailableException instead of a ServerRefusedConnectionException

2021-03-17 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby updated GEODE-9043:
---
Description: 
The exception in the register interest case is a bit confusing.

If a 1.13.2 client attempts to connect to a 1.13.0 server and do a put, it 
throws this ServerRefusedConnectionException with the exact cause:
{noformat}
Exception in thread "main" 
org.apache.geode.cache.client.NoAvailableServersException: 
org.apache.geode.cache.client.ServerRefusedConnectionException: 
nn.nnn.nnn.nn(3047):41001(version:GEODE 1.13.0) refused connection: Peer or 
client version with ordinal 121 not supported. Highest known version is 1.13.0 
Client: /nn.nnn.nnn.nn:64123.
at 
org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.createPooledConnection(ConnectionManagerImpl.java:200)
at 
org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.borrowConnection(ConnectionManagerImpl.java:273)
at 
org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:128)
at 
org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:796)
at org.apache.geode.cache.client.internal.PutOp.execute(PutOp.java:91)
{noformat}
If the client attempts to registerInterest, it throws this 
NoSubscriptionServersAvailableException:
{noformat}
Exception in thread "main" 
org.apache.geode.cache.NoSubscriptionServersAvailableException: 
org.apache.geode.cache.NoSubscriptionServersAvailableException: Could not 
initialize a primary queue on startup. No queue servers available.
at 
org.apache.geode.cache.client.internal.QueueManagerImpl.getAllConnections(QueueManagerImpl.java:190)
at 
org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnQueuesAndReturnPrimaryResult(OpExecutorImpl.java:432)
at 
org.apache.geode.cache.client.internal.PoolImpl.executeOnQueuesAndReturnPrimaryResult(PoolImpl.java:870)
at 
org.apache.geode.cache.Region.registerInterestForAllKeys(Region.java:1657)
{noformat}
The log does contain a message like below so it can be determined the exact 
cause, but not in the exception:
{noformat}
[warn 2021/03/15 11:59:04.100 PDT client  tid=0x1] Could not create a new 
connection to server: nn.nnn.nnn.nn(9838):41001(version:GEODE 1.13.0) 
refused connection: Peer or client version with ordinal 121 not supported. 
Highest known version is 1.13.0 Client: /nn.nnn.nnn.nn:65323.
{noformat}

  was:
The exception in the register interest case is a bit confusing.

If a 1.13.2 client attempts to connect to a 1.13.0 server and do a put, it 
throws this ServerRefusedConnectionException with the exact cause:
{noformat}
Exception in thread "main" 
org.apache.geode.cache.client.NoAvailableServersException: 
org.apache.geode.cache.client.ServerRefusedConnectionException: 
nn.nnn.nnn.nn(3047):41001(version:GEODE 1.13.0) refused connection: Peer or 
client version with ordinal 121 not supported. Highest known version is 1.13.0 
Client: /nn.nnn.nnn.nn:64123.
at 
org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.createPooledConnection(ConnectionManagerImpl.java:200)
at 
org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.borrowConnection(ConnectionManagerImpl.java:273)
at 
org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:128)
at 
org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:796)
at org.apache.geode.cache.client.internal.PutOp.execute(PutOp.java:91)
{noformat}
If the client attempts to registerInterest, it throws this 
NoSubscriptionServersAvailableException:
{noformat}
Exception in thread "main" 
org.apache.geode.cache.NoSubscriptionServersAvailableException: 
org.apache.geode.cache.NoSubscriptionServersAvailableException: Could not 
initialize a primary queue on startup. No queue servers available.
at 
org.apache.geode.cache.client.internal.QueueManagerImpl.getAllConnections(QueueManagerImpl.java:190)
at 
org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnQueuesAndReturnPrimaryResult(OpExecutorImpl.java:432)
at 
org.apache.geode.cache.client.internal.PoolImpl.executeOnQueuesAndReturnPrimaryResult(PoolImpl.java:870)
at 
org.apache.geode.cache.Region.registerInterestForAllKeys(Region.java:1657)
{noformat}
The log does contain a message like below so it can be determined the exact 
cause, buts not in the exception:
{noformat}
[warn 2021/03/15 11:59:04.100 PDT client  tid=0x1] Could not create a new 
connection to server: nn.nnn.nnn.nn(9838):41001(version:GEODE 1.13.0) 
refused connection: Peer or client version with ordinal 121 not supported. 
Highest known version is 1.13.0 Client: /nn.nnn.nnn.nn:65323.
{noformat}



> A register interest attempt from a newer client to an older server throws a 
> NoSubscriptionServersAvailableException 

[jira] [Commented] (GEODE-9043) A register interest attempt from a newer client to an older server throws a NoSubscriptionServersAvailableException instead of a ServerRefusedConnectionException

2021-03-16 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302741#comment-17302741
 ] 

Barrett Oglesby commented on GEODE-9043:


There are a couple ways to address this, but the easiest looks like to add the 
ServerRefusedConnectionException to QueueManagerImpl.initializeConnections here:
{noformat}
for (ServerLocation server : servers) {
  Connection connection = null;
  try {
connection = factory.createClientToServerConnection(server, true);
exToLog = null;
->} catch (GemFireSecurityException | GemFireConfigException | 
ServerRefusedConnectionException e) {
throw e;
  } catch (Exception e) {
exToLog = e;
  }
{noformat}
That matches what happens with GemFireSecurityException or 
GemFireConfigException and causes an exception like:
{noformat}
Exception in thread "main" 
org.apache.geode.cache.client.ServerRefusedConnectionException: 
nn.nnn.nnn.nn(9838):41001(version:GEODE 1.13.0) refused connection: Peer or 
client version with ordinal 121 not supported. Highest known version is 1.13.0 
Client: /nn.nnn.nnn.nn:65532.
at 
org.apache.geode.internal.cache.tier.sockets.Handshake.readMessage(Handshake.java:331)
at 
org.apache.geode.cache.client.internal.ClientSideHandshakeImpl.handshakeWithServer(ClientSideHandshakeImpl.java:233)
at 
org.apache.geode.cache.client.internal.ConnectionImpl.connect(ConnectionImpl.java:107)
at 
org.apache.geode.cache.client.internal.ConnectionConnector.connectClientToServer(ConnectionConnector.java:75)
at 
org.apache.geode.cache.client.internal.ConnectionFactoryImpl.createClientToServerConnection(ConnectionFactoryImpl.java:118)
at 
org.apache.geode.cache.client.internal.QueueManagerImpl.initializeConnections(QueueManagerImpl.java:456)
at 
org.apache.geode.cache.client.internal.QueueManagerImpl.start(QueueManagerImpl.java:293)
{noformat}

> A register interest attempt from a newer client to an older server throws a 
> NoSubscriptionServersAvailableException instead of a 
> ServerRefusedConnectionException
> -
>
> Key: GEODE-9043
> URL: https://issues.apache.org/jira/browse/GEODE-9043
> Project: Geode
>  Issue Type: Bug
>  Components: client/server
>Reporter: Barrett Oglesby
>Priority: Major
>
> The exception in the register interest case is a bit confusing.
> If a 1.13.2 client attempts to connect to a 1.13.0 server and do a put, it 
> throws this ServerRefusedConnectionException with the exact cause:
> {noformat}
> Exception in thread "main" 
> org.apache.geode.cache.client.NoAvailableServersException: 
> org.apache.geode.cache.client.ServerRefusedConnectionException: 
> nn.nnn.nnn.nn(3047):41001(version:GEODE 1.13.0) refused connection: Peer 
> or client version with ordinal 121 not supported. Highest known version is 
> 1.13.0 Client: /nn.nnn.nnn.nn:64123.
>   at 
> org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.createPooledConnection(ConnectionManagerImpl.java:200)
>   at 
> org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.borrowConnection(ConnectionManagerImpl.java:273)
>   at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:128)
>   at 
> org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:796)
>   at org.apache.geode.cache.client.internal.PutOp.execute(PutOp.java:91)
> {noformat}
> If the client attempts to registerInterest, it throws this 
> NoSubscriptionServersAvailableException:
> {noformat}
> Exception in thread "main" 
> org.apache.geode.cache.NoSubscriptionServersAvailableException: 
> org.apache.geode.cache.NoSubscriptionServersAvailableException: Could not 
> initialize a primary queue on startup. No queue servers available.
>   at 
> org.apache.geode.cache.client.internal.QueueManagerImpl.getAllConnections(QueueManagerImpl.java:190)
>   at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnQueuesAndReturnPrimaryResult(OpExecutorImpl.java:432)
>   at 
> org.apache.geode.cache.client.internal.PoolImpl.executeOnQueuesAndReturnPrimaryResult(PoolImpl.java:870)
>   at 
> org.apache.geode.cache.Region.registerInterestForAllKeys(Region.java:1657)
> {noformat}
> The log does contain a message like below so it can be determined the exact 
> cause, buts not in the exception:
> {noformat}
> [warn 2021/03/15 11:59:04.100 PDT client  tid=0x1] Could not create a 
> new connection to server: nn.nnn.nnn.nn(9838):41001(version:GEODE 1.13.0) 
> refused connection: Peer or client version with ordinal 121 not supported. 
> Highest known version is 1.13.0 Client: 

[jira] [Created] (GEODE-9043) A register interest attempt from a newer client to an older server throws a NoSubscriptionServersAvailableException instead of a ServerRefusedConnectionException

2021-03-16 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-9043:
--

 Summary: A register interest attempt from a newer client to an 
older server throws a NoSubscriptionServersAvailableException instead of a 
ServerRefusedConnectionException
 Key: GEODE-9043
 URL: https://issues.apache.org/jira/browse/GEODE-9043
 Project: Geode
  Issue Type: Bug
  Components: client/server
Reporter: Barrett Oglesby


The exception in the register interest case is a bit confusing.

If a 1.13.2 client attempts to connect to a 1.13.0 server and do a put, it 
throws this ServerRefusedConnectionException with the exact cause:
{noformat}
Exception in thread "main" 
org.apache.geode.cache.client.NoAvailableServersException: 
org.apache.geode.cache.client.ServerRefusedConnectionException: 
nn.nnn.nnn.nn(3047):41001(version:GEODE 1.13.0) refused connection: Peer or 
client version with ordinal 121 not supported. Highest known version is 1.13.0 
Client: /nn.nnn.nnn.nn:64123.
at 
org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.createPooledConnection(ConnectionManagerImpl.java:200)
at 
org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.borrowConnection(ConnectionManagerImpl.java:273)
at 
org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:128)
at 
org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:796)
at org.apache.geode.cache.client.internal.PutOp.execute(PutOp.java:91)
{noformat}
If the client attempts to registerInterest, it throws this 
NoSubscriptionServersAvailableException:
{noformat}
Exception in thread "main" 
org.apache.geode.cache.NoSubscriptionServersAvailableException: 
org.apache.geode.cache.NoSubscriptionServersAvailableException: Could not 
initialize a primary queue on startup. No queue servers available.
at 
org.apache.geode.cache.client.internal.QueueManagerImpl.getAllConnections(QueueManagerImpl.java:190)
at 
org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnQueuesAndReturnPrimaryResult(OpExecutorImpl.java:432)
at 
org.apache.geode.cache.client.internal.PoolImpl.executeOnQueuesAndReturnPrimaryResult(PoolImpl.java:870)
at 
org.apache.geode.cache.Region.registerInterestForAllKeys(Region.java:1657)
{noformat}
The log does contain a message like below so it can be determined the exact 
cause, buts not in the exception:
{noformat}
[warn 2021/03/15 11:59:04.100 PDT client  tid=0x1] Could not create a new 
connection to server: nn.nnn.nnn.nn(9838):41001(version:GEODE 1.13.0) 
refused connection: Peer or client version with ordinal 121 not supported. 
Highest known version is 1.13.0 Client: /nn.nnn.nnn.nn:65323.
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (GEODE-9040) The SingleThreadColocationLogger executorService is not shutdown when the server is stopped

2021-03-15 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-9040:
--

Assignee: Barrett Oglesby

> The SingleThreadColocationLogger executorService is not shutdown when the 
> server is stopped
> ---
>
> Key: GEODE-9040
> URL: https://issues.apache.org/jira/browse/GEODE-9040
> Project: Geode
>  Issue Type: Bug
>  Components: logging
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>
> When a server is shutdown, its JVM remains alive because the ExecutorService 
> created by the SingleThreadColocationLogger is not terminated nor is its 
> thread a daemon:
> {noformat}
> "ColocationLogger for customer" #57 prio=5 os_prio=31 tid=0x7fb39d4e4000 
> nid=0xb203 waiting on condition [0x7dc58000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x000785268818> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The SingleThreadColocationLogger only gets created when there are missing 
> co-located regions.
> We can either terminate the ExecutorService or make its thread a daemon or 
> both.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-9040) The SingleThreadColocationLogger executorService is not shutdown when the server is stopped

2021-03-15 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-9040:
--

 Summary: The SingleThreadColocationLogger executorService is not 
shutdown when the server is stopped
 Key: GEODE-9040
 URL: https://issues.apache.org/jira/browse/GEODE-9040
 Project: Geode
  Issue Type: Bug
  Components: logging
Reporter: Barrett Oglesby


When a server is shutdown, its JVM remains alive because the ExecutorService 
created by the SingleThreadColocationLogger is not terminated nor is its thread 
a daemon:
{noformat}
"ColocationLogger for customer" #57 prio=5 os_prio=31 tid=0x7fb39d4e4000 
nid=0xb203 waiting on condition [0x7dc58000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x000785268818> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}
The SingleThreadColocationLogger only gets created when there are missing 
co-located regions.

We can either terminate the ExecutorService or make its thread a daemon or both.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-9030) The PartitionedIndex arbitraryBucketIndex doesn't get reset when the BucketRegion defining it is moved

2021-03-12 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-9030:
--

 Summary: The PartitionedIndex arbitraryBucketIndex doesn't get 
reset when the BucketRegion defining it is moved
 Key: GEODE-9030
 URL: https://issues.apache.org/jira/browse/GEODE-9030
 Project: Geode
  Issue Type: Bug
  Components: querying
Reporter: Barrett Oglesby


This causes a RegionDestroyedException like this when executing a query 
containing a != clause:
{noformat}
Exception in thread "main" 
org.apache.geode.cache.client.ServerOperationException: remote server on 
10.166.145.16(client:27461:loner):58776:dfd3ba27:client: While performing a 
remote query
at 
org.apache.geode.cache.client.internal.AbstractOp.processChunkedResponse(AbstractOp.java:342)
at 
org.apache.geode.cache.client.internal.QueryOp$QueryOpImpl.processResponse(QueryOp.java:168)
at 
org.apache.geode.cache.client.internal.AbstractOp.processResponse(AbstractOp.java:224)
at 
org.apache.geode.cache.client.internal.AbstractOp.attemptReadResponse(AbstractOp.java:197)
at 
org.apache.geode.cache.client.internal.AbstractOp.attempt(AbstractOp.java:384)
at 
org.apache.geode.cache.client.internal.ConnectionImpl.execute(ConnectionImpl.java:284)
at 
org.apache.geode.cache.client.internal.pooling.PooledConnection.execute(PooledConnection.java:355)
at 
org.apache.geode.cache.client.internal.OpExecutorImpl.executeWithPossibleReAuthentication(OpExecutorImpl.java:756)
at 
org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:142)
at 
org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:112)
at 
org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:797)
at 
org.apache.geode.cache.client.internal.QueryOp.execute(QueryOp.java:59)
at 
org.apache.geode.cache.client.internal.ServerProxy.query(ServerProxy.java:59)
at 
org.apache.geode.cache.query.internal.DefaultQuery.executeOnServer(DefaultQuery.java:327)
at 
org.apache.geode.cache.query.internal.DefaultQuery.execute(DefaultQuery.java:215)
at 
org.apache.geode.cache.query.internal.DefaultQuery.execute(DefaultQuery.java:197)
Caused by: org.apache.geode.cache.query.QueryInvocationTargetException: The 
Region on which query is executed may have been 
destroyed.BucketRegion[path='/__PR/_B__trade_0;serial=12;primary=false]
at 
org.apache.geode.internal.cache.PRQueryProcessor.executeQueryOnBuckets(PRQueryProcessor.java:264)
at 
org.apache.geode.internal.cache.PRQueryProcessor.executeSequentially(PRQueryProcessor.java:214)
at 
org.apache.geode.internal.cache.PRQueryProcessor.executeQuery(PRQueryProcessor.java:124)
at 
org.apache.geode.internal.cache.partitioned.QueryMessage.operateOnPartitionedRegion(QueryMessage.java:210)
Caused by: org.apache.geode.cache.RegionDestroyedException: 
BucketRegion[path='/__PR/_B__trade_0;serial=12;primary=false]
at 
org.apache.geode.internal.cache.LocalRegion.checkRegionDestroyed(LocalRegion.java:7352)
at 
org.apache.geode.internal.cache.LocalRegion.checkReadiness(LocalRegion.java:2757)
at 
org.apache.geode.internal.cache.BucketRegion.checkReadiness(BucketRegion.java:1437)
at 
org.apache.geode.internal.cache.LocalRegion.size(LocalRegion.java:8313)
at 
org.apache.geode.cache.query.internal.index.CompactRangeIndex.getSizeEstimate(CompactRangeIndex.java:331)
at 
org.apache.geode.cache.query.internal.CompiledComparison.getSizeEstimate(CompiledComparison.java:337)
at 
org.apache.geode.cache.query.internal.GroupJunction.organizeOperands(GroupJunction.java:146)
at 
org.apache.geode.cache.query.internal.AbstractGroupOrRangeJunction.filterEvaluate(AbstractGroupOrRangeJunction.java:148)
at 
org.apache.geode.cache.query.internal.CompiledJunction.filterEvaluate(CompiledJunction.java:190)
at 
org.apache.geode.cache.query.internal.CompiledSelect.evaluate(CompiledSelect.java:538)
at 
org.apache.geode.cache.query.internal.CompiledSelect.evaluate(CompiledSelect.java:53)
at 
org.apache.geode.cache.query.internal.DefaultQuery.executeUsingContext(DefaultQuery.java:357)
at 
org.apache.geode.internal.cache.PRQueryProcessor.executeQueryOnBuckets(PRQueryProcessor.java:248)
{noformat}
Here is an example query that fails:
{noformat}
SELECT * FROM /trade WHERE arrangementId = 'aId_1' AND tradeStatus.toString() 
!= 'CLOSED'
{noformat}
Here is a test that reproduces it:
 * start one server with region configured as PARTITION with:
 ** 2 buckets
 ** PartitionResolver that puts the first entry in bucket 0, every other entry 
in bucket 1
 * load N entries
 * the index in bucket 0 becomes the arbitraryBucketIndex
 * start a second server
 * rebalance
 * bucket 0 moves from the first server to the second server
 * run the 

[jira] [Resolved] (GEODE-8992) When a GatewaySenderEventImpl is serialized, its operationDetail field is not included

2021-03-09 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby resolved GEODE-8992.

Fix Version/s: 1.15.0
   Resolution: Fixed

> When a GatewaySenderEventImpl is serialized, its operationDetail field is not 
> included
> --
>
> Key: GEODE-8992
> URL: https://issues.apache.org/jira/browse/GEODE-8992
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: blocks-1.15.0​, pull-request-available
> Fix For: 1.15.0
>
>
> This causes the operation to become less specific when the 
> {{GatewaySenderEventImpl}} is deserialized.
> Here is an example.
> If the original {{GatewaySenderEventImpl}} is a *PUTALL_CREATE* like:
> {noformat}
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x10063|1;sequenceID=0;bucketId=99];action=0;operation=PUTALL_CREATE;region=/data;key=0;value=0;...]
> {noformat}
> Then, when the {{GatewaySenderEventImpl}} is serialized and deserialized, its 
> operation becomes a *CREATE*:
> {noformat}
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x10063|1;sequenceID=0;bucketId=99];action=0;operation=CREATE;region=/data;key=0;value=0;...]
> {noformat}
> Thats because {{GatewaySenderEventImpl.getOperation}} uses both *action* and 
> *operationDetail* to determine its operation:
> {noformat}
> public Operation getOperation() {
>   Operation op = null;
>   switch (this.action) {
> case CREATE_ACTION:
>   switch (this.operationDetail) {
> case ...
> case OP_DETAIL_PUTALL:
>   op = Operation.PUTALL_CREATE;
>   break;
> default:
>   op = Operation.CREATE;
>   break;
>   }
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (GEODE-8992) When a GatewaySenderEventImpl is serialized, its operationDetail field is not included

2021-03-03 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-8992:
--

Assignee: Barrett Oglesby

> When a GatewaySenderEventImpl is serialized, its operationDetail field is not 
> included
> --
>
> Key: GEODE-8992
> URL: https://issues.apache.org/jira/browse/GEODE-8992
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: blocks-1.15.0​
>
> This causes the operation to become less specific when the 
> {{GatewaySenderEventImpl}} is deserialized.
> Here is an example.
> If the original {{GatewaySenderEventImpl}} is a *PUTALL_CREATE* like:
> {noformat}
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x10063|1;sequenceID=0;bucketId=99];action=0;operation=PUTALL_CREATE;region=/data;key=0;value=0;...]
> {noformat}
> Then, when the {{GatewaySenderEventImpl}} is serialized and deserialized, its 
> operation becomes a *CREATE*:
> {noformat}
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x10063|1;sequenceID=0;bucketId=99];action=0;operation=CREATE;region=/data;key=0;value=0;...]
> {noformat}
> Thats because {{GatewaySenderEventImpl.getOperation}} uses both *action* and 
> *operationDetail* to determine its operation:
> {noformat}
> public Operation getOperation() {
>   Operation op = null;
>   switch (this.action) {
> case CREATE_ACTION:
>   switch (this.operationDetail) {
> case ...
> case OP_DETAIL_PUTALL:
>   op = Operation.PUTALL_CREATE;
>   break;
> default:
>   op = Operation.CREATE;
>   break;
>   }
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-8992) When a GatewaySenderEventImpl is serialized, its operationDetail field is not included

2021-03-02 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-8992:
--

 Summary: When a GatewaySenderEventImpl is serialized, its 
operationDetail field is not included
 Key: GEODE-8992
 URL: https://issues.apache.org/jira/browse/GEODE-8992
 Project: Geode
  Issue Type: Bug
  Components: wan
Reporter: Barrett Oglesby


This causes the operation to become less specific when the 
{{GatewaySenderEventImpl}} is deserialized.

Here is an example.

If the original {{GatewaySenderEventImpl}} is a *PUTALL_CREATE* like:
{noformat}
GatewaySenderEventImpl[id=EventID[id=31 
bytes;threadID=0x10063|1;sequenceID=0;bucketId=99];action=0;operation=PUTALL_CREATE;region=/data;key=0;value=0;...]
{noformat}
Then, when the {{GatewaySenderEventImpl}} is serialized and deserialized, its 
operation becomes a *CREATE*:
{noformat}
GatewaySenderEventImpl[id=EventID[id=31 
bytes;threadID=0x10063|1;sequenceID=0;bucketId=99];action=0;operation=CREATE;region=/data;key=0;value=0;...]
{noformat}
Thats because {{GatewaySenderEventImpl.getOperation}} uses both *action* and 
*operationDetail* to determine its operation:
{noformat}
public Operation getOperation() {
  Operation op = null;
  switch (this.action) {
case CREATE_ACTION:
  switch (this.operationDetail) {
case ...
case OP_DETAIL_PUTALL:
  op = Operation.PUTALL_CREATE;
  break;
default:
  op = Operation.CREATE;
  break;
  }
...
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8926) CQ events can be missed while executing with initial results simultaneously with transactions

2021-02-08 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281309#comment-17281309
 ] 

Barrett Oglesby commented on GEODE-8926:


I attached a sequence diagram showing the the interleaved behavior that causes 
the issue.

> CQ events can be missed while executing with initial results simultaneously 
> with transactions
> -
>
> Key: GEODE-8926
> URL: https://issues.apache.org/jira/browse/GEODE-8926
> Project: Geode
>  Issue Type: Bug
>  Components: cq
>Reporter: Barrett Oglesby
>Priority: Major
> Attachments: cq_with_transaction_behavior.png
>
>
> In this case, the event is not in either the initial results or received in 
> the CqListener.
> A test that shows the behavior is:
> - 2 servers with:
>  - a root PR
>  - a colocated child PR
> In a client, asynchronously:
> - start a transaction that:
> - does N puts into the root PR
> - does 1 put into the child PR
> - commit the transaction
> In the client:
> create N CQs with initial results with: 'select * from /childPR'
> When the test succeeds, all the CQs either get the 1 event in their initial 
> results or in their CqListener.
> When the test fails, one or more CQs don't see the event either way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8926) CQ events can be missed while executing with initial results simultaneously with transactions

2021-02-08 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby updated GEODE-8926:
---
Attachment: cq_with_transaction_behavior.png

> CQ events can be missed while executing with initial results simultaneously 
> with transactions
> -
>
> Key: GEODE-8926
> URL: https://issues.apache.org/jira/browse/GEODE-8926
> Project: Geode
>  Issue Type: Bug
>  Components: cq
>Reporter: Barrett Oglesby
>Priority: Major
> Attachments: cq_with_transaction_behavior.png
>
>
> In this case, the event is not in either the initial results or received in 
> the CqListener.
> A test that shows the behavior is:
> - 2 servers with:
>  - a root PR
>  - a colocated child PR
> In a client, asynchronously:
> - start a transaction that:
> - does N puts into the root PR
> - does 1 put into the child PR
> - commit the transaction
> In the client:
> create N CQs with initial results with: 'select * from /childPR'
> When the test succeeds, all the CQs either get the 1 event in their initial 
> results or in their CqListener.
> When the test fails, one or more CQs don't see the event either way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-8926) CQ events can be missed while executing with initial results simultaneously with transactions

2021-02-08 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-8926:
--

 Summary: CQ events can be missed while executing with initial 
results simultaneously with transactions
 Key: GEODE-8926
 URL: https://issues.apache.org/jira/browse/GEODE-8926
 Project: Geode
  Issue Type: Bug
  Components: cq
Reporter: Barrett Oglesby


In this case, the event is not in either the initial results or received in the 
CqListener.

A test that shows the behavior is:

- 2 servers with:
 - a root PR
 - a colocated child PR

In a client, asynchronously:

- start a transaction that:
- does N puts into the root PR
- does 1 put into the child PR
- commit the transaction

In the client:

create N CQs with initial results with: 'select * from /childPR'

When the test succeeds, all the CQs either get the 1 event in their initial 
results or in their CqListener.

When the test fails, one or more CQs don't see the event either way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-8916) The gfsh export stack traces command should include the locators

2021-02-03 Thread Barrett Oglesby (Jira)
Barrett Oglesby created GEODE-8916:
--

 Summary: The gfsh export stack traces command should include the 
locators
 Key: GEODE-8916
 URL: https://issues.apache.org/jira/browse/GEODE-8916
 Project: Geode
  Issue Type: Bug
Reporter: Barrett Oglesby


The gfsh export stack traces command should include the locators, but only 
includes the servers.

Here is an excerpt from a slack conversation showing the behavior:
{noformat}
Shelley Hughes-Godfrey  6:48 PM

I have a question about gfsh export stack-traces ...

"list members" shows me servers and locators ...

gfsh>list members
Member Count : 3
  Name| Id
- | 

gemfire-cluster-server-0  | xx.xx.x.xxx(gemfire-cluster-server-0:1):41000
gemfire-cluster-locator-0 | 
xx.xx.x.xxx(gemfire-cluster-locator-0:1:locator):41000 [Coordinator]
gemfire-cluster-server-1  | xx.xx.x.xxx(gemfire-cluster-server-1:1):41000

But, if I don't specify members on the export stack-traces command, I just get 
the stacks for the servers.

gfsh>export stack-traces
stack-trace(s) exported to file: /path/stacktrace_1612316330340
On host : ...

Specifying a locator returns "No Members found"

gfsh>export stack-traces --member=gemfire-cluster-locator-0
No Members Found

Barry Oglesby  2 hours ago
That command excludes the locators. It uses this method in ManagementUtils to 
get just the normal members:

public static Set getAllNormalMembers(InternalCache cache) {
  return new HashSet(
  cache.getDistributionManager().getNormalDistributionManagerIds());
}

Shelley Hughes-Godfrey  1 hour ago

So, I also ran "export logs" with --member=

And that works

gfsh>list members
Member Count : 3
  Name| Id
- | 

gemfire-cluster-server-0  | xx.xx.x.xxx(gemfire-cluster-server-0:1):41000
gemfire-cluster-locator-0 | 
xx.xx.x.xxx(gemfire-cluster-locator-0:1:locator):41000 [Coordinator]
gemfire-cluster-server-1  | xx.xx.x.xxx(gemfire-cluster-server-1:1):41000

gfsh>export logs --member=gemfire-cluster-locator-0
Logs exported to the connected member's file system: 
/path/exportedLogs_1612374651595.zip

Barry Oglesby  44 minutes ago

The ExportLogsCommand gets all the members including the locators:

Set targetMembers = getMembersIncludingLocators(groups, 
memberIds);

I tried a test by changing ExportStackTraceCommand.exportStackTrace:

From:

Set targetMembers = getMembers(group, memberNameOrId);

To:

Set targetMembers = getMembersIncludingLocators(group, 
memberNameOrId);

And the locator stack was exported:

*** Stack-trace for member locator at 2021/02/03 10:01:28.824 ***
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEODE-8827) The DiskRegionStats bytesOnlyOnDisk stat is not incremented during persistent region recovery

2021-01-20 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby resolved GEODE-8827.

Fix Version/s: 1.14.0
   Resolution: Fixed

> The DiskRegionStats bytesOnlyOnDisk stat is not incremented during persistent 
> region recovery
> -
>
> Key: GEODE-8827
> URL: https://issues.apache.org/jira/browse/GEODE-8827
> Project: Geode
>  Issue Type: Bug
>  Components: persistence, statistics
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0
>
> Attachments: 
> DiskRegionStats_entriesOnlyOnDisk_bytesOnlyOnDisk_after_gets.gif, 
> DiskRegionStats_entriesOnlyOnDisk_bytesOnlyOnDisk_after_gets_with_change.gif, 
> DiskRegionStats_entriesOnlyOnDisk_bytesOnlyOnDisk_after_restart.gif, 
> DiskRegionStats_entriesOnlyOnDisk_bytesOnlyOnDisk_after_restart_with_change.gif,
>  DiskRegionStats_entriesOnlyOnDisk_bytesOnlyOnDisk_no_eviction.gif
>
>
> With a test like:
>  - 2 servers with partitioned region configured like:
>  ** persistence enabled
>  ** heap eviction with overflow enabled
>  - load enough entries to cause overflow
>  - shut down the servers
>  - restart the servers
>  - execute a function to get all entries in each server
> After the step to restart the servers, the bytesOnlyOnDisk stat is 0.
> After the step to get all entries, the bytesOnlyOnDisk stat is negative.
> The entriesInVM and entriesOnlyOnDisk stats are incremented as BucketRegions 
> are recovered from disk in LocalRegion.initializeStats here:
> {noformat}
> java.lang.Exception: Stack trace
>   at java.lang.Thread.dumpStack(Thread.java:1333)
>   at 
> org.apache.geode.internal.cache.LocalRegion.initializeStats(LocalRegion.java:10222)
>   at 
> org.apache.geode.internal.cache.BucketRegion.initializeStats(BucketRegion.java:2163)
>   at 
> org.apache.geode.internal.cache.AbstractDiskRegion.copyExistingRegionMap(AbstractDiskRegion.java:775)
>   at 
> org.apache.geode.internal.cache.DiskStoreImpl.initializeOwner(DiskStoreImpl.java:631)
>   at 
> org.apache.geode.internal.cache.DiskRegion.initializeOwner(DiskRegion.java:239)
>   at 
> org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1081)
>   at 
> org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:262)
>   at 
> org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:981)
>   at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:785)
>   at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:460)
>   at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:319)
>   at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2896)
>   at 
> org.apache.geode.internal.cache.ProxyBucketRegion.recoverFromDisk(ProxyBucketRegion.java:441)
>   at 
> org.apache.geode.internal.cache.ProxyBucketRegion.recoverFromDiskRecursively(ProxyBucketRegion.java:407)
>   at 
> org.apache.geode.internal.cache.PRHARedundancyProvider$2.run2(PRHARedundancyProvider.java:1640)
>   at 
> org.apache.geode.internal.cache.partitioned.RecoveryRunnable.run(RecoveryRunnable.java:60)
>   at 
> org.apache.geode.internal.cache.PRHARedundancyProvider$2.run(PRHARedundancyProvider.java:1630)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The current LocalRegion.initializeStats method implementation is:
> {noformat}
> public void initializeStats(long numEntriesInVM, long numOverflowOnDisk,
> long numOverflowBytesOnDisk) {
>   getDiskRegion().getStats().incNumEntriesInVM(numEntriesInVM);
>   getDiskRegion().getStats().incNumOverflowOnDisk(numOverflowOnDisk);
> }
> {noformat}
> Even though numOverflowBytesOnDisk is passed into this method, it is ignored 
> as this logging shows:
> {noformat}
> [warn 2021/01/12 11:19:11.785 PST   
> tid=0x49] XXX LocalRegion.initializeStats numOverflowBytesOnDisk=4546560; 
> bytesOnlyOnDiskFromStats=0
> [warn 2021/01/12 11:19:11.791 PST   
> tid=0x4f] XXX LocalRegion.initializeStats numOverflowBytesOnDisk=4536320; 
> bytesOnlyOnDiskFromStats=0
> [warn 2021/01/12 11:19:11.797 PST   
> tid=0x4c] XXX LocalRegion.initializeStats numOverflowBytesOnDisk=4526080; 
> bytesOnlyOnDiskFromStats=0
> [warn 2021/01/12 11:19:11.800 PST   
> tid=0x48] XXX LocalRegion.initializeStats numOverflowBytesOnDisk=4546560; 
> bytesOnlyOnDiskFromStats=0
> [warn 2021/01/12 11:19:11.801 PST   
> tid=0x4e] XXX LocalRegion.initializeStats 

[jira] [Resolved] (GEODE-8278) Gateway sender queues using heap memory way above configured value after server restart

2021-01-14 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby resolved GEODE-8278.

Fix Version/s: 1.14.0
   Resolution: Fixed

> Gateway sender queues using heap memory way above configured value after 
> server restart
> ---
>
> Key: GEODE-8278
> URL: https://issues.apache.org/jira/browse/GEODE-8278
> Project: Geode
>  Issue Type: Bug
>  Components: eviction
>Reporter: Alberto Gomez
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> In a Geode system with the following characteristics:
>  * WAN replication
>  * partition redundant regions
>  * overflow configured for the gateway senders queues by means of persistence 
> and maximum queue memory set.
>  * gateway receivers stopped in one site (B)
>  * Operations sent to the site that does not have the gateway receivers 
> stopped (A)
> When operations are sent to site A, the gateway sender queues start to grow 
> as expected and the heap memory consumed by the queues does not grow 
> indefinitely given that there is overflow to disk when the limit is reached.
> But, if a server is restarted, the restarted server will show a much higher 
> heap memory used than the memory used by this server before it was restarted 
> or by the other servers.
> This can even provoke that the server cannot be restarted if the heap memory 
> it requires is above the limit configured.
> According to the memory analyzer the entries taking up the memory are 
> subclasses of ```VMThinDiskLRURegionEntryHeap```.
> The number of instances of this type are the same in the restarted server 
> than in the not restarted servers but on the restarted server they take much 
> more memory. The reason seems to be that the ```value``` member attribute of 
> the instances, in the case of the restarted server contains 
> ```VMCachedDeserializable``` objects while in the case of the not restarted 
> server the attribute contains either ```null``` or 
> ```GatewaySenderEventImpl``` objects that use much less memory than the 
> ```VMCachedDeserializable``` ones.
>  If redundancy is not configured for the region then the problem is not 
> manifested, i.e. the heap memory used by the restarted server is similar to 
> the one prior to the restart.
> If the node not restarted is restarted then the previously restarted node 
> seems to release the extra memory (my guess is that it is processing the 
> other process queue).
> Also, if traffic is sent again to the Geode cluster, then it seems eviction 
> kicks in and after some short time, the memory of the restarted server goes 
> down to the level it had before it had been restarted.
> As a summary, the problem seems to be that if a server does GII 
> (getInitialImage) from another server, eviction does not occur for gateway 
> sender queue entries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (GEODE-8278) Gateway sender queues using heap memory way above configured value after server restart

2021-01-14 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby reassigned GEODE-8278:
--

Assignee: Barrett Oglesby  (was: Alberto Gomez)

> Gateway sender queues using heap memory way above configured value after 
> server restart
> ---
>
> Key: GEODE-8278
> URL: https://issues.apache.org/jira/browse/GEODE-8278
> Project: Geode
>  Issue Type: Bug
>  Components: eviction
>Reporter: Alberto Gomez
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: pull-request-available
>
> In a Geode system with the following characteristics:
>  * WAN replication
>  * partition redundant regions
>  * overflow configured for the gateway senders queues by means of persistence 
> and maximum queue memory set.
>  * gateway receivers stopped in one site (B)
>  * Operations sent to the site that does not have the gateway receivers 
> stopped (A)
> When operations are sent to site A, the gateway sender queues start to grow 
> as expected and the heap memory consumed by the queues does not grow 
> indefinitely given that there is overflow to disk when the limit is reached.
> But, if a server is restarted, the restarted server will show a much higher 
> heap memory used than the memory used by this server before it was restarted 
> or by the other servers.
> This can even provoke that the server cannot be restarted if the heap memory 
> it requires is above the limit configured.
> According to the memory analyzer the entries taking up the memory are 
> subclasses of ```VMThinDiskLRURegionEntryHeap```.
> The number of instances of this type are the same in the restarted server 
> than in the not restarted servers but on the restarted server they take much 
> more memory. The reason seems to be that the ```value``` member attribute of 
> the instances, in the case of the restarted server contains 
> ```VMCachedDeserializable``` objects while in the case of the not restarted 
> server the attribute contains either ```null``` or 
> ```GatewaySenderEventImpl``` objects that use much less memory than the 
> ```VMCachedDeserializable``` ones.
>  If redundancy is not configured for the region then the problem is not 
> manifested, i.e. the heap memory used by the restarted server is similar to 
> the one prior to the restart.
> If the node not restarted is restarted then the previously restarted node 
> seems to release the extra memory (my guess is that it is processing the 
> other process queue).
> Also, if traffic is sent again to the Geode cluster, then it seems eviction 
> kicks in and after some short time, the memory of the restarted server goes 
> down to the level it had before it had been restarted.
> As a summary, the problem seems to be that if a server does GII 
> (getInitialImage) from another server, eviction does not occur for gateway 
> sender queue entries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8827) The DiskRegionStats bytesOnlyOnDisk stat is not incremented during persistent region recovery

2021-01-13 Thread Barrett Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barrett Oglesby updated GEODE-8827:
---
Attachment: 
DiskRegionStats_entriesOnlyOnDisk_bytesOnlyOnDisk_no_eviction.gif

> The DiskRegionStats bytesOnlyOnDisk stat is not incremented during persistent 
> region recovery
> -
>
> Key: GEODE-8827
> URL: https://issues.apache.org/jira/browse/GEODE-8827
> Project: Geode
>  Issue Type: Bug
>  Components: persistence, statistics
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> DiskRegionStats_entriesOnlyOnDisk_bytesOnlyOnDisk_after_gets.gif, 
> DiskRegionStats_entriesOnlyOnDisk_bytesOnlyOnDisk_after_gets_with_change.gif, 
> DiskRegionStats_entriesOnlyOnDisk_bytesOnlyOnDisk_after_restart.gif, 
> DiskRegionStats_entriesOnlyOnDisk_bytesOnlyOnDisk_after_restart_with_change.gif,
>  DiskRegionStats_entriesOnlyOnDisk_bytesOnlyOnDisk_no_eviction.gif
>
>
> With a test like:
>  - 2 servers with partitioned region configured like:
>  ** persistence enabled
>  ** heap eviction with overflow enabled
>  - load enough entries to cause overflow
>  - shut down the servers
>  - restart the servers
>  - execute a function to get all entries in each server
> After the step to restart the servers, the bytesOnlyOnDisk stat is 0.
> After the step to get all entries, the bytesOnlyOnDisk stat is negative.
> The entriesInVM and entriesOnlyOnDisk stats are incremented as BucketRegions 
> are recovered from disk in LocalRegion.initializeStats here:
> {noformat}
> java.lang.Exception: Stack trace
>   at java.lang.Thread.dumpStack(Thread.java:1333)
>   at 
> org.apache.geode.internal.cache.LocalRegion.initializeStats(LocalRegion.java:10222)
>   at 
> org.apache.geode.internal.cache.BucketRegion.initializeStats(BucketRegion.java:2163)
>   at 
> org.apache.geode.internal.cache.AbstractDiskRegion.copyExistingRegionMap(AbstractDiskRegion.java:775)
>   at 
> org.apache.geode.internal.cache.DiskStoreImpl.initializeOwner(DiskStoreImpl.java:631)
>   at 
> org.apache.geode.internal.cache.DiskRegion.initializeOwner(DiskRegion.java:239)
>   at 
> org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1081)
>   at 
> org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:262)
>   at 
> org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:981)
>   at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:785)
>   at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:460)
>   at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:319)
>   at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2896)
>   at 
> org.apache.geode.internal.cache.ProxyBucketRegion.recoverFromDisk(ProxyBucketRegion.java:441)
>   at 
> org.apache.geode.internal.cache.ProxyBucketRegion.recoverFromDiskRecursively(ProxyBucketRegion.java:407)
>   at 
> org.apache.geode.internal.cache.PRHARedundancyProvider$2.run2(PRHARedundancyProvider.java:1640)
>   at 
> org.apache.geode.internal.cache.partitioned.RecoveryRunnable.run(RecoveryRunnable.java:60)
>   at 
> org.apache.geode.internal.cache.PRHARedundancyProvider$2.run(PRHARedundancyProvider.java:1630)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The current LocalRegion.initializeStats method implementation is:
> {noformat}
> public void initializeStats(long numEntriesInVM, long numOverflowOnDisk,
> long numOverflowBytesOnDisk) {
>   getDiskRegion().getStats().incNumEntriesInVM(numEntriesInVM);
>   getDiskRegion().getStats().incNumOverflowOnDisk(numOverflowOnDisk);
> }
> {noformat}
> Even though numOverflowBytesOnDisk is passed into this method, it is ignored 
> as this logging shows:
> {noformat}
> [warn 2021/01/12 11:19:11.785 PST   
> tid=0x49] XXX LocalRegion.initializeStats numOverflowBytesOnDisk=4546560; 
> bytesOnlyOnDiskFromStats=0
> [warn 2021/01/12 11:19:11.791 PST   
> tid=0x4f] XXX LocalRegion.initializeStats numOverflowBytesOnDisk=4536320; 
> bytesOnlyOnDiskFromStats=0
> [warn 2021/01/12 11:19:11.797 PST   
> tid=0x4c] XXX LocalRegion.initializeStats numOverflowBytesOnDisk=4526080; 
> bytesOnlyOnDiskFromStats=0
> [warn 2021/01/12 11:19:11.800 PST   
> tid=0x48] XXX LocalRegion.initializeStats numOverflowBytesOnDisk=4546560; 
> bytesOnlyOnDiskFromStats=0
> [warn 2021/01/12 11:19:11.801 PST   
> tid=0x4e] XXX LocalRegion.initializeStats 

[jira] [Commented] (GEODE-8827) The DiskRegionStats bytesOnlyOnDisk stat is not incremented during persistent region recovery

2021-01-13 Thread Barrett Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264356#comment-17264356
 ] 

Barrett Oglesby commented on GEODE-8827:


Here is a simpler test that shows a negative bytesOnlyOnDisk after server 
restart:

- start 1 server with persistent partitioned region
- load entries
- bounce server

After the step to bounce the server, the bytesOnlyOnDisk stat is negative.

The attached DiskRegionStats_entriesOnlyOnDisk_bytesOnlyOnDisk_no_eviction.gif 
chart shows this behavior.


> The DiskRegionStats bytesOnlyOnDisk stat is not incremented during persistent 
> region recovery
> -
>
> Key: GEODE-8827
> URL: https://issues.apache.org/jira/browse/GEODE-8827
> Project: Geode
>  Issue Type: Bug
>  Components: persistence, statistics
>Reporter: Barrett Oglesby
>Assignee: Barrett Oglesby
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> DiskRegionStats_entriesOnlyOnDisk_bytesOnlyOnDisk_after_gets.gif, 
> DiskRegionStats_entriesOnlyOnDisk_bytesOnlyOnDisk_after_gets_with_change.gif, 
> DiskRegionStats_entriesOnlyOnDisk_bytesOnlyOnDisk_after_restart.gif, 
> DiskRegionStats_entriesOnlyOnDisk_bytesOnlyOnDisk_after_restart_with_change.gif,
>  DiskRegionStats_entriesOnlyOnDisk_bytesOnlyOnDisk_no_eviction.gif
>
>
> With a test like:
>  - 2 servers with partitioned region configured like:
>  ** persistence enabled
>  ** heap eviction with overflow enabled
>  - load enough entries to cause overflow
>  - shut down the servers
>  - restart the servers
>  - execute a function to get all entries in each server
> After the step to restart the servers, the bytesOnlyOnDisk stat is 0.
> After the step to get all entries, the bytesOnlyOnDisk stat is negative.
> The entriesInVM and entriesOnlyOnDisk stats are incremented as BucketRegions 
> are recovered from disk in LocalRegion.initializeStats here:
> {noformat}
> java.lang.Exception: Stack trace
>   at java.lang.Thread.dumpStack(Thread.java:1333)
>   at 
> org.apache.geode.internal.cache.LocalRegion.initializeStats(LocalRegion.java:10222)
>   at 
> org.apache.geode.internal.cache.BucketRegion.initializeStats(BucketRegion.java:2163)
>   at 
> org.apache.geode.internal.cache.AbstractDiskRegion.copyExistingRegionMap(AbstractDiskRegion.java:775)
>   at 
> org.apache.geode.internal.cache.DiskStoreImpl.initializeOwner(DiskStoreImpl.java:631)
>   at 
> org.apache.geode.internal.cache.DiskRegion.initializeOwner(DiskRegion.java:239)
>   at 
> org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1081)
>   at 
> org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:262)
>   at 
> org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:981)
>   at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:785)
>   at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:460)
>   at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:319)
>   at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2896)
>   at 
> org.apache.geode.internal.cache.ProxyBucketRegion.recoverFromDisk(ProxyBucketRegion.java:441)
>   at 
> org.apache.geode.internal.cache.ProxyBucketRegion.recoverFromDiskRecursively(ProxyBucketRegion.java:407)
>   at 
> org.apache.geode.internal.cache.PRHARedundancyProvider$2.run2(PRHARedundancyProvider.java:1640)
>   at 
> org.apache.geode.internal.cache.partitioned.RecoveryRunnable.run(RecoveryRunnable.java:60)
>   at 
> org.apache.geode.internal.cache.PRHARedundancyProvider$2.run(PRHARedundancyProvider.java:1630)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The current LocalRegion.initializeStats method implementation is:
> {noformat}
> public void initializeStats(long numEntriesInVM, long numOverflowOnDisk,
> long numOverflowBytesOnDisk) {
>   getDiskRegion().getStats().incNumEntriesInVM(numEntriesInVM);
>   getDiskRegion().getStats().incNumOverflowOnDisk(numOverflowOnDisk);
> }
> {noformat}
> Even though numOverflowBytesOnDisk is passed into this method, it is ignored 
> as this logging shows:
> {noformat}
> [warn 2021/01/12 11:19:11.785 PST   
> tid=0x49] XXX LocalRegion.initializeStats numOverflowBytesOnDisk=4546560; 
> bytesOnlyOnDiskFromStats=0
> [warn 2021/01/12 11:19:11.791 PST   
> tid=0x4f] XXX LocalRegion.initializeStats numOverflowBytesOnDisk=4536320; 
> bytesOnlyOnDiskFromStats=0
> [warn 2021/01/12 11:19:11.797 PST   
> tid=0x4c] XXX 

  1   2   3   >