Re: Rescued Geode Protobuf

2021-05-05 Thread Udo Kohlmeyer
Thank you Michael,

I’ll definitely be looking into this in the future…

--Udo

From: Michael Oleske 
Date: Thursday, May 6, 2021 at 9:22 AM
To: dev@geode.apache.org 
Subject: Rescued Geode Protobuf
Hi Geode Friends!

Since I didn't see any movement on rescuing Geode Protobuf into another 
repository, I went ahead and did that.  It seemed like a better idea to me than 
trying to search commit history or old branches with git.  It is located at

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmoleske%2Fgeode-protobufdata=04%7C01%7Cudo%40vmware.com%7Cefc8bd86596a44551e9808d9101c957d%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637558537224172673%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=E%2BY3nfg7wn3ZEfj69rlZHH8rNuNSpbeIwg7R5kP8VtU%3Dreserved=0

I'm willing to move it into the Apache org if someone who is not lazy like me 
wants to help.  I have only upgraded to Gradle 6.8.3 and pull Geode 
dependencies from Maven as the big changes.  I add some GitHub actions for CI 
and to automatically open PRs to try and keep it up to date and working, mostly 
because I won't dedicate that much time to it.  I just really dislike seeing 
code that seemed useful get lost like that (I spend a lot of time thinking 
about how to preserve technological history.)

Anywho, hope this helps out anyone who might decide protobuf is worth spending 
time with.

-michael


Rescued Geode Protobuf

2021-05-05 Thread Michael Oleske
Hi Geode Friends!

Since I didn't see any movement on rescuing Geode Protobuf into another 
repository, I went ahead and did that.  It seemed like a better idea to me than 
trying to search commit history or old branches with git.  It is located at

https://github.com/moleske/geode-protobuf

I'm willing to move it into the Apache org if someone who is not lazy like me 
wants to help.  I have only upgraded to Gradle 6.8.3 and pull Geode 
dependencies from Maven as the big changes.  I add some GitHub actions for CI 
and to automatically open PRs to try and keep it up to date and working, mostly 
because I won't dedicate that much time to it.  I just really dislike seeing 
code that seemed useful get lost like that (I spend a lot of time thinking 
about how to preserve technological history.)

Anywho, hope this helps out anyone who might decide protobuf is worth spending 
time with.

-michael


[DRAFT] Geode Board Report

2021-05-05 Thread Dave Barnes
This is a review draft of our report to the Apache Board on the Geode
project. Please send me your feedback by Monday, May 10.
In particular, let me know if I omitted any publications or community
outreach efforts.
Thanks,
Dave

## Description:
The mission of Apache Geode is the creation and maintenance of software
related
to a data management platform that provides real-time, consistent access to
data-intensive applications throughout widely distributed cloud
architectures.

## Issues:
Karen Miller notified the PMC on 7-April-2021 of her intent to resign from
the
position of PMC Chair. Karen has held the position since Dec 2018. The
community
solicited nominations and conducted an election.
A resolution has been submitted separately from this report naming
Dan Smith (upthewaterspout) as the new Apache Geode PMC Chair.

## Membership Data:
Apache Geode was founded 2016-11-15 (over 4 years ago)
There are currently 114 committers and 54 PMC members in this project.
The Committer-to-PMC ratio is roughly 5:3.

Community changes, past quarter:
- Donal Evans was added to the PMC on 2021-03-22
- Alberto Gomez was added as committer on 2021-02-17
- Kamilla Aslami was added as committer on 2021-04-19
- Matthew Reddington was added as committer on 2021-03-24
- Ray Ingles was added as committer on 2021-03-18

## Project Activity:
Three releases of Apache Geode were issued during this reporting period.
Work areas included:
- Improvements to existing tests and addition of new ones
- Improvements to WAN replication
- Several bug fixes and improvements in the area of disaster recovery and
  restart

Recent releases:
- 1.13.2 was released on 2021-03-29.
- 1.12.1 was released on 2021-02-26.
- 1.12.2 was released on 2021-04-22.

## Community Health:
Initiatives during this period include:
- Implementation of CODEOWNERS and CODEWATCHERS lists to facilitate PR
reviews
- For the geode-native client, ongoing discussions regarding C++ version
support
and coding standards

These initiatives and the releases listed above are reflected in Apache
Geode repository activity:
- 308 issues opened in JIRA, past quarter (30% increase)
- 236 issues closed in JIRA, past quarter (6% increase)
- 691 commits in the past quarter (50% increase)
- 62 code contributors in the past quarter (10% increase)
- 488 PRs opened on GitHub, past quarter (29% increase)
- 478 PRs closed on GitHub, past quarter (28% increase)

PMC member Barry Ogelesby published [Transmitting Deltas Between Different
Apache Geode Distributed
Systems](
https://medium.com/@boglesby_2508/transmitting-deltas-between-different-apache-geode-distributed-systems-e46a3eae931
)
on 23-March-2021.


Re: DISCUSSION: Geode Native C++ Style and Formatting Guide

2021-05-05 Thread Jacob Barrett
Clang format and clang-tidy are enforcing many Google C++ rules and our 
deviations from that.

None of those tools enforce naming conventions.

I am not so much looking for this to be a holy war, nor do I think it has even 
remotely degraded to that. The original post was to agree on the set of 
currently undocumented deviations and secondarily determine if anyone has 
strong feelings towards changing any of those before we document it.

Do far I am not hearing any strong feelings one way or another on the 
deviations, nor has anyone really raised any missing deviations.

-Jake


> On May 5, 2021, at 1:24 PM, Ernie Burghardt  wrote:
> 
> What are we missing/unable to format with our clang-tools?  
> These style discussions tend to become holy wars, hopefully we can avoid 
> this...
> if we can tool are largest concerns and perform good PR reviews looking for 
> valid algorithms, good mnemonics naming variables and such I think we'll be 
> doing just fine.
> 
> EB
> 
> On 5/3/21, 2:07 PM, "Robert Houghton"  wrote:
> 
>80 characters also feels arbitrary, especially with auto-formatters 
> (clang-tidy?) of mangling some otherwise very-readable code.
> 
>From: Blake Bender 
>Date: Monday, May 3, 2021 at 11:23 AM
>To: dev@geode.apache.org 
>Subject: RE: DISCUSSION: Geode Native C++ Style and Formatting Guide
>My $0.02 on these:
> 
>Things I'd like to see us conform to Google style on:
>* I'd be happy to move to C++ 17
>* Would also be happy to remove forward declarations.  "I'm not a critic, 
> but I know what I hate," as it were, and I hate forward declarations.
>* I would also be happy with an 80-character line limit, though I don't 
> feel strongly about it.  100 may be consistent with Geode, but it still feels 
> arbitrary to me.
>* I would be very pleased to remove all the macros from our code.  I've 
> been bitten more than once in the past while debugging or refactoring our 
> code, because of ill-formed macros.
> 
>Google things I disagree with:
>* I don't like exceptions, but I don't even want to think about the amount 
> of effort required to remove them from the codebase is, IMO, unreasonably 
> high.  Keep the exceptions, most of the time they're used pretty judiciously.
>* I really, really, *really* (really?  Yes, really!) hate anything 
> resembling Hungarian prefix notation, and have permanent scars from decades 
> of reading it in Windows code.  Please don't ask me to put a random 'k' in 
> from of my enums - ick.
> 
>One other note: in the past, we've had conversations about "style only" 
> pull requests to fix some of these things, and the guidance we ended up with 
> has been to only fix this sort of thing while you're in the code working on a 
> fix or a feature.  I, for one, would welcome some PRs that just, say, renamed 
> a ton of member variables to replace "m_" prefix with a simple trailing "_", 
> perhaps fixed some of the more egregious and weird abbreviations, etc.  My 
> preference for bug fixes and feature work is that all of the code changes be 
> focused on stuff that's relevant to the fix/feature, and mixing it with 
> random style guide refactoring, I feel, muddies the waters for future 
> maintainers.
> 
>Thanks,
> 
>Blake
> 
>-Original Message-
>From: Jacob Barrett 
>Sent: Saturday, May 1, 2021 9:21 AM
>To: dev@geode.apache.org
>Subject: Re: DISCUSSION: Geode Native C++ Style and Formatting Guide
> 
>Great call outs!
> 
>> On May 1, 2021, at 7:57 AM, Mario Salazar de Torres 
>>  wrote:
>> 
>> 1.  Member variables names as of Google style guide requires a '_' char to 
>> be added at the end so it can be identified. Should we also adopt that?
>> For example, imagine you have a region mutex, so, should we name it as 
>> 'regionMutex_' ?
> 
>I didn’t mention this one out in my review of differences because we are 
> following it but I suppose with the combination of the camelCase difference 
> we should probably call it out more specifically. Perhaps in our 
> documentation we should show examples of both local and member variables. Do 
> you think that will make it more clear?
> 
>> 2.  Also, I would like to point out that macros are dis-recommended but 
>> every C++ committee member I know.
>> What do you think about adding a notice saying: "Macros should be avoided 
>> and only used when there is no alternative”?
> 
>I think that is called out in various ways in a few places in the Google 
> guide but I am more than happy for us to include strong or clearer language 
> around this. Between constexpr and templates there are very cases for macros 
> anymore.
>We mostly use macros only to handle non-standard attributes. When we move 
> to C++17 a lot of these will go away.
> 
>Thanks,
>Jake
> 
> 
> 



Re: DISCUSSION: Geode Native C++ Style and Formatting Guide

2021-05-05 Thread Ernie Burghardt
What are we missing/unable to format with our clang-tools?  
These style discussions tend to become holy wars, hopefully we can avoid this...
 if we can tool are largest concerns and perform good PR reviews looking for 
valid algorithms, good mnemonics naming variables and such I think we'll be 
doing just fine.

EB

On 5/3/21, 2:07 PM, "Robert Houghton"  wrote:

80 characters also feels arbitrary, especially with auto-formatters 
(clang-tidy?) of mangling some otherwise very-readable code.

From: Blake Bender 
Date: Monday, May 3, 2021 at 11:23 AM
To: dev@geode.apache.org 
Subject: RE: DISCUSSION: Geode Native C++ Style and Formatting Guide
My $0.02 on these:

Things I'd like to see us conform to Google style on:
* I'd be happy to move to C++ 17
* Would also be happy to remove forward declarations.  "I'm not a critic, 
but I know what I hate," as it were, and I hate forward declarations.
* I would also be happy with an 80-character line limit, though I don't 
feel strongly about it.  100 may be consistent with Geode, but it still feels 
arbitrary to me.
* I would be very pleased to remove all the macros from our code.  I've 
been bitten more than once in the past while debugging or refactoring our code, 
because of ill-formed macros.

Google things I disagree with:
* I don't like exceptions, but I don't even want to think about the amount 
of effort required to remove them from the codebase is, IMO, unreasonably high. 
 Keep the exceptions, most of the time they're used pretty judiciously.
* I really, really, *really* (really?  Yes, really!) hate anything 
resembling Hungarian prefix notation, and have permanent scars from decades of 
reading it in Windows code.  Please don't ask me to put a random 'k' in from of 
my enums - ick.

One other note: in the past, we've had conversations about "style only" 
pull requests to fix some of these things, and the guidance we ended up with 
has been to only fix this sort of thing while you're in the code working on a 
fix or a feature.  I, for one, would welcome some PRs that just, say, renamed a 
ton of member variables to replace "m_" prefix with a simple trailing "_", 
perhaps fixed some of the more egregious and weird abbreviations, etc.  My 
preference for bug fixes and feature work is that all of the code changes be 
focused on stuff that's relevant to the fix/feature, and mixing it with random 
style guide refactoring, I feel, muddies the waters for future maintainers.

Thanks,

Blake

-Original Message-
From: Jacob Barrett 
Sent: Saturday, May 1, 2021 9:21 AM
To: dev@geode.apache.org
Subject: Re: DISCUSSION: Geode Native C++ Style and Formatting Guide

Great call outs!

> On May 1, 2021, at 7:57 AM, Mario Salazar de Torres 
 wrote:
>
>  1.  Member variables names as of Google style guide requires a '_' char 
to be added at the end so it can be identified. Should we also adopt that?
> For example, imagine you have a region mutex, so, should we name it as 
'regionMutex_' ?

I didn’t mention this one out in my review of differences because we are 
following it but I suppose with the combination of the camelCase difference we 
should probably call it out more specifically. Perhaps in our documentation we 
should show examples of both local and member variables. Do you think that will 
make it more clear?

>  2.  Also, I would like to point out that macros are dis-recommended but 
every C++ committee member I know.
> What do you think about adding a notice saying: "Macros should be avoided 
and only used when there is no alternative”?

I think that is called out in various ways in a few places in the Google 
guide but I am more than happy for us to include strong or clearer language 
around this. Between constexpr and templates there are very cases for macros 
anymore.
We mostly use macros only to handle non-standard attributes. When we move 
to C++17 a lot of these will go away.

Thanks,
Jake





Re: Today's Community Meeting

2021-05-05 Thread Alexander Murmann
Sorry, our next meeting won't be April 2nd, but June 2nd. Apparently, I've lost 
all sense of time during the pandemic.

From: Alexander Murmann
Sent: Wednesday, May 5, 2021 09:39
To: geode 
Subject: Today's Community Meeting

Hi everyone!

It was great to see so many folks attend our first community meeting today and 
contribute to the discussion. An especially big thank you goes to Alberto Gomez 
for presenting his 
RFC!

Recording and short notes can be found on the Community Meeting Confluence 
page.

Our next meeting will be on Wednesday, April 2nd. I'd love to give someone else 
the opportunity to host the next session. Please raise your virtual hand if you 
are interested!


Please also respond to this email thread with any thoughts on how the next 
session could be even better.

Thanks!


Today's Community Meeting

2021-05-05 Thread Alexander Murmann
Hi everyone!

It was great to see so many folks attend our first community meeting today and 
contribute to the discussion. An especially big thank you goes to Alberto Gomez 
for presenting his 
RFC!

Recording and short notes can be found on the Community Meeting Confluence 
page.

Our next meeting will be on Wednesday, April 2nd. I'd love to give someone else 
the opportunity to host the next session. Please raise your virtual hand if you 
are interested!


Please also respond to this email thread with any thoughts on how the next 
session could be even better.

Thanks!


Re: JDK 16 Support?

2021-05-05 Thread John Blum
Hi Anthony-

Thank you for the quick reply.

The Spring Data Team is currently looking ahead towards Java 16 when building 
and running Spring Data examples to get a sense for what works and what 
doesn't, now that Java 16 is GA along with anticipation for users with 
questions or problems.

Spring Framework itself aligns and is based on LTS Java versions only, 
currently Java 8 with Spring Framework 5.  Spring Framework 6 will likely move 
the baseline to Java 11 or possibly even Java 17, we are not sure which yet.

Just want to share our findings and give advanced notice.

Thanks,
John


From: Anthony Baker 
Sent: Wednesday, May 5, 2021 8:14 AM
To: u...@geode.apache.org 
Cc: geode 
Subject: Re: JDK 16 Support?

Thanks for reporting this John.  The next LTS version of Java (17) is due later 
this year.  I think Geode needs to at least support every LTS version of Java 
and clearly we would need to fix errors like this. Do you see a need to support 
Java 16 now?

Anthony


On May 5, 2021, at 7:57 AM, John Blum 
mailto:jb...@vmware.com>> wrote:

What is the plan to support Java 16 for Apache Geode?  Timeframe?

Running Apache Geode on a Java 16 Runtime produces errors like the following:


- org.apache.geode.InternalGemFireException: unable to retrieve underlying byte 
buffer
-  at 
org.apache.geode.internal.net.BufferPool.getPoolableBuffer(BufferPool.java:346)
-  at 
org.apache.geode.internal.net.BufferPool.releaseBuffer(BufferPool.java:310)
-  at 
org.apache.geode.internal.net.BufferPool.releaseSenderBuffer(BufferPool.java:213)
-  at org.apache.geode.internal.tcp.MsgStreamer.release(MsgStreamer.java:100)
-  at 
org.apache.geode.internal.tcp.MsgStreamer.writeMessage(MsgStreamer.java:256)
-  at 
org.apache.geode.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:306)
-  at 
org.apache.geode.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:182)
-  at 
org.apache.geode.distributed.internal.direct.DirectChannel.send(DirectChannel.java:511)
-  at 
org.apache.geode.distributed.internal.DistributionImpl.directChannelSend(DistributionImpl.java:346)
-  at 
org.apache.geode.distributed.internal.DistributionImpl.send(DistributionImpl.java:291)
-  at 
org.apache.geode.distributed.internal.ClusterDistributionManager.sendViaMembershipManager(ClusterDistributionManager.java:2050)
-  at 
org.apache.geode.distributed.internal.ClusterDistributionManager.sendOutgoing(ClusterDistributionManager.java:1978)
-  at 
org.apache.geode.distributed.internal.ClusterDistributionManager.sendMessage(ClusterDistributionManager.java:2015)
-  at 
org.apache.geode.distributed.internal.ClusterDistributionManager.putOutgoing(ClusterDistributionManager.java:1083)
-  at 
org.apache.geode.distributed.internal.StartupMessage.process(StartupMessage.java:279)
-  at 
org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:376)
-  at 
org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:441)
-  at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
-  at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
-  at 
org.apache.geode.distributed.internal.ClusterOperationExecutors.runUntilShutdown(ClusterOperationExecutors.java:441)
-  at 
org.apache.geode.distributed.internal.ClusterOperationExecutors.doWaitingThread(ClusterOperationExecutors.java:410)
-  at 
org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:119)
-  at java.base/java.lang.Thread.run(Thread.java:831)
- Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make 
public java.lang.Object java.nio.DirectByteBuffer.attachment() accessible: 
module java.base does not "opens java.nio" to unnamed module @40f9161a
-  at 
java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:357)
-  at 
java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
-  at java.base/java.lang.reflect.Method.checkCanSetAccessible(Method.java:199)
-  at java.base/java.lang.reflect.Method.setAccessible(Method.java:193)
-  at 
org.apache.geode.internal.net.BufferPool.getPoolableBuffer(BufferPool.java:343)
-  ... 22 common frames omitted
- 2021-04-30 14:57:13,638  INFO ributed.internal.membership.gms.Services: 606 - 
received leave request from 
10.99.199.28(CacheNotUsingSharedConfigurationIntegrationTest:29149):41001 
for 
10.99.199.28(CacheNotUsingSharedConfigurationIntegrationTest:29149):41001
- 2021-04-30 14:57:13,640  INFO ributed.internal.membership.gms.Services: 617 - 
JoinLeave.processMessage(LeaveRequestMessage) invoked.  isCoordinator=true; 
isStopping=false; cancelInProgress=false
- 2021-04-30 14:57:13,647 ERROR xecutors.LoggingUncaughtExceptionHandler:  92 - 
Uncaught exception in thread Thread[P2P message reader for 

Re: JDK 16 Support?

2021-05-05 Thread Anthony Baker
Thanks for reporting this John.  The next LTS version of Java (17) is due later 
this year.  I think Geode needs to at least support every LTS version of Java 
and clearly we would need to fix errors like this. Do you see a need to support 
Java 16 now?

Anthony


On May 5, 2021, at 7:57 AM, John Blum 
mailto:jb...@vmware.com>> wrote:

What is the plan to support Java 16 for Apache Geode?  Timeframe?

Running Apache Geode on a Java 16 Runtime produces errors like the following:


- org.apache.geode.InternalGemFireException: unable to retrieve underlying byte 
buffer
-  at 
org.apache.geode.internal.net.BufferPool.getPoolableBuffer(BufferPool.java:346)
-  at 
org.apache.geode.internal.net.BufferPool.releaseBuffer(BufferPool.java:310)
-  at 
org.apache.geode.internal.net.BufferPool.releaseSenderBuffer(BufferPool.java:213)
-  at org.apache.geode.internal.tcp.MsgStreamer.release(MsgStreamer.java:100)
-  at 
org.apache.geode.internal.tcp.MsgStreamer.writeMessage(MsgStreamer.java:256)
-  at 
org.apache.geode.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:306)
-  at 
org.apache.geode.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:182)
-  at 
org.apache.geode.distributed.internal.direct.DirectChannel.send(DirectChannel.java:511)
-  at 
org.apache.geode.distributed.internal.DistributionImpl.directChannelSend(DistributionImpl.java:346)
-  at 
org.apache.geode.distributed.internal.DistributionImpl.send(DistributionImpl.java:291)
-  at 
org.apache.geode.distributed.internal.ClusterDistributionManager.sendViaMembershipManager(ClusterDistributionManager.java:2050)
-  at 
org.apache.geode.distributed.internal.ClusterDistributionManager.sendOutgoing(ClusterDistributionManager.java:1978)
-  at 
org.apache.geode.distributed.internal.ClusterDistributionManager.sendMessage(ClusterDistributionManager.java:2015)
-  at 
org.apache.geode.distributed.internal.ClusterDistributionManager.putOutgoing(ClusterDistributionManager.java:1083)
-  at 
org.apache.geode.distributed.internal.StartupMessage.process(StartupMessage.java:279)
-  at 
org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:376)
-  at 
org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:441)
-  at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
-  at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
-  at 
org.apache.geode.distributed.internal.ClusterOperationExecutors.runUntilShutdown(ClusterOperationExecutors.java:441)
-  at 
org.apache.geode.distributed.internal.ClusterOperationExecutors.doWaitingThread(ClusterOperationExecutors.java:410)
-  at 
org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:119)
-  at java.base/java.lang.Thread.run(Thread.java:831)
- Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make 
public java.lang.Object java.nio.DirectByteBuffer.attachment() accessible: 
module java.base does not "opens java.nio" to unnamed module @40f9161a
-  at 
java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:357)
-  at 
java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
-  at java.base/java.lang.reflect.Method.checkCanSetAccessible(Method.java:199)
-  at java.base/java.lang.reflect.Method.setAccessible(Method.java:193)
-  at 
org.apache.geode.internal.net.BufferPool.getPoolableBuffer(BufferPool.java:343)
-  ... 22 common frames omitted
- 2021-04-30 14:57:13,638  INFO ributed.internal.membership.gms.Services: 606 - 
received leave request from 
10.99.199.28(CacheNotUsingSharedConfigurationIntegrationTest:29149):41001 
for 
10.99.199.28(CacheNotUsingSharedConfigurationIntegrationTest:29149):41001
- 2021-04-30 14:57:13,640  INFO ributed.internal.membership.gms.Services: 617 - 
JoinLeave.processMessage(LeaveRequestMessage) invoked.  isCoordinator=true; 
isStopping=false; cancelInProgress=false
- 2021-04-30 14:57:13,647 ERROR xecutors.LoggingUncaughtExceptionHandler:  92 - 
Uncaught exception in thread Thread[P2P message reader for 
10.99.199.28(CacheNotUsingSharedConfigurationIntegrationTest:29149):41001 
shared unordered uid=1 local port=53039 remote port=64063,10,main]
- org.apache.geode.InternalGemFireException: unable to retrieve underlying byte 
buffer
-  at 
org.apache.geode.internal.net.BufferPool.getPoolableBuffer(BufferPool.java:346)
-  at 
org.apache.geode.internal.net.BufferPool.releaseBuffer(BufferPool.java:310)
-  at 
org.apache.geode.internal.net.BufferPool.releaseReceiveBuffer(BufferPool.java:217)
-  at 
org.apache.geode.internal.tcp.Connection.releaseInputBuffer(Connection.java:1512)
-  at org.apache.geode.internal.tcp.Connection.run(Connection.java:1495)
-  at java.base/java.lang.Thread.run(Thread.java:831)
- Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make 

JDK 16 Support?

2021-05-05 Thread John Blum
What is the plan to support Java 16 for Apache Geode?  Timeframe?

Running Apache Geode on a Java 16 Runtime produces errors like the following:


- org.apache.geode.InternalGemFireException: unable to retrieve underlying byte 
buffer
-  at 
org.apache.geode.internal.net.BufferPool.getPoolableBuffer(BufferPool.java:346)
-  at 
org.apache.geode.internal.net.BufferPool.releaseBuffer(BufferPool.java:310)
-  at 
org.apache.geode.internal.net.BufferPool.releaseSenderBuffer(BufferPool.java:213)
-  at org.apache.geode.internal.tcp.MsgStreamer.release(MsgStreamer.java:100)
-  at 
org.apache.geode.internal.tcp.MsgStreamer.writeMessage(MsgStreamer.java:256)
-  at 
org.apache.geode.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:306)
-  at 
org.apache.geode.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:182)
-  at 
org.apache.geode.distributed.internal.direct.DirectChannel.send(DirectChannel.java:511)
-  at 
org.apache.geode.distributed.internal.DistributionImpl.directChannelSend(DistributionImpl.java:346)
-  at 
org.apache.geode.distributed.internal.DistributionImpl.send(DistributionImpl.java:291)
-  at 
org.apache.geode.distributed.internal.ClusterDistributionManager.sendViaMembershipManager(ClusterDistributionManager.java:2050)
-  at 
org.apache.geode.distributed.internal.ClusterDistributionManager.sendOutgoing(ClusterDistributionManager.java:1978)
-  at 
org.apache.geode.distributed.internal.ClusterDistributionManager.sendMessage(ClusterDistributionManager.java:2015)
-  at 
org.apache.geode.distributed.internal.ClusterDistributionManager.putOutgoing(ClusterDistributionManager.java:1083)
-  at 
org.apache.geode.distributed.internal.StartupMessage.process(StartupMessage.java:279)
-  at 
org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:376)
-  at 
org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:441)
-  at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
-  at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
-  at 
org.apache.geode.distributed.internal.ClusterOperationExecutors.runUntilShutdown(ClusterOperationExecutors.java:441)
-  at 
org.apache.geode.distributed.internal.ClusterOperationExecutors.doWaitingThread(ClusterOperationExecutors.java:410)
-  at 
org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:119)
-  at java.base/java.lang.Thread.run(Thread.java:831)
- Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make 
public java.lang.Object java.nio.DirectByteBuffer.attachment() accessible: 
module java.base does not "opens java.nio" to unnamed module @40f9161a
-  at 
java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:357)
-  at 
java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
-  at java.base/java.lang.reflect.Method.checkCanSetAccessible(Method.java:199)
-  at java.base/java.lang.reflect.Method.setAccessible(Method.java:193)
-  at 
org.apache.geode.internal.net.BufferPool.getPoolableBuffer(BufferPool.java:343)
-  ... 22 common frames omitted
- 2021-04-30 14:57:13,638  INFO ributed.internal.membership.gms.Services: 606 - 
received leave request from 
10.99.199.28(CacheNotUsingSharedConfigurationIntegrationTest:29149):41001 
for 
10.99.199.28(CacheNotUsingSharedConfigurationIntegrationTest:29149):41001
- 2021-04-30 14:57:13,640  INFO ributed.internal.membership.gms.Services: 617 - 
JoinLeave.processMessage(LeaveRequestMessage) invoked.  isCoordinator=true; 
isStopping=false; cancelInProgress=false
- 2021-04-30 14:57:13,647 ERROR xecutors.LoggingUncaughtExceptionHandler:  92 - 
Uncaught exception in thread Thread[P2P message reader for 
10.99.199.28(CacheNotUsingSharedConfigurationIntegrationTest:29149):41001 
shared unordered uid=1 local port=53039 remote port=64063,10,main]
- org.apache.geode.InternalGemFireException: unable to retrieve underlying byte 
buffer
-  at 
org.apache.geode.internal.net.BufferPool.getPoolableBuffer(BufferPool.java:346)
-  at 
org.apache.geode.internal.net.BufferPool.releaseBuffer(BufferPool.java:310)
-  at 
org.apache.geode.internal.net.BufferPool.releaseReceiveBuffer(BufferPool.java:217)
-  at 
org.apache.geode.internal.tcp.Connection.releaseInputBuffer(Connection.java:1512)
-  at org.apache.geode.internal.tcp.Connection.run(Connection.java:1495)
-  at java.base/java.lang.Thread.run(Thread.java:831)
- Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make 
public java.lang.Object java.nio.DirectByteBuffer.attachment() accessible: 
module java.base does not "opens java.nio" to unnamed module @40f9161a
-  at 
java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:357)
-  at 

Re: Odg: Geode retry/acknowledge improvement

2021-05-05 Thread Alberto Gomez
Please, disregard my last e-mail.

I was having a parallel conversation by e-mail with Mario on this topic and 
sent the e-mail to the list by mistake.

BR,

Alberto

From: Alberto Gomez 
Sent: Wednesday, May 5, 2021 11:29 AM
To: dev@geode.apache.org 
Subject: Re: Odg: Geode retry/acknowledge improvement

You could answer to their latest e-mail to confirm that Darrel's suspicion 
could happen. Let's see if in that case they are willing to collaborate.

Alberto

From: Mario Ivanac 
Sent: Wednesday, May 5, 2021 11:28 AM
To: dev@geode.apache.org 
Subject: Odg: Odg: Geode retry/acknowledge improvement

Hi,

I think that we have problem that Darrel was suspicious, and that some kind of 
notification could be send from peer-to-peer to acknowledge that message is 
received on receiving side.

Regarding test with ip tables, execution gets stuck with conserve-sockets set 
to false or true.

BR,
Mario

Šalje: Darrel Schneider 
Poslano: 30. travnja 2021. 18:38
Prima: dev@geode.apache.org 
Predmet: Re: Odg: Geode retry/acknowledge improvement

In the geode hang you describe would the forced tcp-reset using iptables have 
cause the put send message to fail with an exception writing it to the socket? 
If so then I'd expect the geode Connection class to keep trying to send that 
message by creating a new connection to the member. It will keep doing this 
until the send is successful or the member leaves the cluster.

But if the tcp-reset allows the send to complete, without actually sending the 
request to the other member, then geode will be in trouble and will wait 
forever for a reply. Once geode successfully writes a p2p message on a socket, 
it expects it to be processed on the other side OR it expects the other side to 
leave the geode cluster. If neither of these happen then it will wait forever 
for a response. I've wondered in the past if this was a safe expectation. If 
not then do we need to send some type of msg id and after waiting for a reply 
for too long be able to check with the member to see if it has received the 
message we think we already sent?

You might see different behavior with your iptables test if you use 
conserve-sockets=false. In that case the socket used to write the p2p message 
is also used to read the response. But in the default conserve-sockets=true 
case, the reply comes on a different socket than the one used to send the 
message. It might be hard to get the thread doing the put for gfsh to use 
conserve-sockets=false. You could try just setting that on your server and the 
stuck thread stack should look different from what you are currently seeing.

From: Anthony Baker 
Sent: Friday, April 30, 2021 8:43 AM
To: dev@geode.apache.org 
Subject: Re: Odg: Geode retry/acknowledge improvement

Can you explain the scenario further?  Does the sidecar proxy both the sending 
and receiving socket (geode creates 2 sockets for each p2p member)?  In normal 
cases, closing these sockets should clear up any unacknowledged messages, 
freeing up the thread.

Anthony


> On Apr 20, 2021, at 7:31 AM, Mario Ivanac  wrote:
>
> Hi,
>
> after analysis, we  assume that proxy at reception of packets,  sends ACK on 
> TCP level, and after that moment proxy is restarted.
> This is the reason, we dont see tcp retries.
>
> Simular problem to this (but not packet loss), can be reproduce on geode,
> if on existing connection, after request is sent, tcp reset is received. In 
> that case, at reception of reset
> connection will be closed, and thread will get stuck while waiting on reply.
> I will add reproduction steps in ticket.
>
> 
> Šalje: Anthony Baker 
> Poslano: 19. travnja 2021. 22:54
> Prima: dev@geode.apache.org 
> Predmet: Re: Geode retry/acknowledge improvement
>
> Do you have a tcpdump that demonstrates the packet loss? How long did you 
> wait for TCP to retry the failed packet delivery (sometimes this can be 
> tweaked with tcp_retries2).  Does this manifest as a failed socket connection 
> in geode?  That ought to trigger some error handling IIRC.
>
> Anthony
>
>
>> On Apr 19, 2021, at 7:16 AM, Mario Ivanac  wrote:
>>
>> Hi all,
>>
>> we have deployed geode cluster in kubernetes environment, and Istio/SideCars 
>> are injected between cluster members.
>> While running traffic, if any Istio/SideCar is restarted, thread will get 
>> stuck indefinitely, while waiting for reply on sent message.
>> It seams that due to restarting of proxy, in some cases, messages are lost, 
>> and sending side is waiting indefinitely for reply.
>>
>> 

Odg: Odg: Geode retry/acknowledge improvement

2021-05-05 Thread Mario Ivanac
I think that this is enough.

Šalje: Alberto Gomez 
Poslano: 5. svibnja 2021. 11:29
Prima: dev@geode.apache.org 
Predmet: Re: Odg: Geode retry/acknowledge improvement

You could answer to their latest e-mail to confirm that Darrel's suspicion 
could happen. Let's see if in that case they are willing to collaborate.

Alberto

From: Mario Ivanac 
Sent: Wednesday, May 5, 2021 11:28 AM
To: dev@geode.apache.org 
Subject: Odg: Odg: Geode retry/acknowledge improvement

Hi,

I think that we have problem that Darrel was suspicious, and that some kind of 
notification could be send from peer-to-peer to acknowledge that message is 
received on receiving side.

Regarding test with ip tables, execution gets stuck with conserve-sockets set 
to false or true.

BR,
Mario

Šalje: Darrel Schneider 
Poslano: 30. travnja 2021. 18:38
Prima: dev@geode.apache.org 
Predmet: Re: Odg: Geode retry/acknowledge improvement

In the geode hang you describe would the forced tcp-reset using iptables have 
cause the put send message to fail with an exception writing it to the socket? 
If so then I'd expect the geode Connection class to keep trying to send that 
message by creating a new connection to the member. It will keep doing this 
until the send is successful or the member leaves the cluster.

But if the tcp-reset allows the send to complete, without actually sending the 
request to the other member, then geode will be in trouble and will wait 
forever for a reply. Once geode successfully writes a p2p message on a socket, 
it expects it to be processed on the other side OR it expects the other side to 
leave the geode cluster. If neither of these happen then it will wait forever 
for a response. I've wondered in the past if this was a safe expectation. If 
not then do we need to send some type of msg id and after waiting for a reply 
for too long be able to check with the member to see if it has received the 
message we think we already sent?

You might see different behavior with your iptables test if you use 
conserve-sockets=false. In that case the socket used to write the p2p message 
is also used to read the response. But in the default conserve-sockets=true 
case, the reply comes on a different socket than the one used to send the 
message. It might be hard to get the thread doing the put for gfsh to use 
conserve-sockets=false. You could try just setting that on your server and the 
stuck thread stack should look different from what you are currently seeing.

From: Anthony Baker 
Sent: Friday, April 30, 2021 8:43 AM
To: dev@geode.apache.org 
Subject: Re: Odg: Geode retry/acknowledge improvement

Can you explain the scenario further?  Does the sidecar proxy both the sending 
and receiving socket (geode creates 2 sockets for each p2p member)?  In normal 
cases, closing these sockets should clear up any unacknowledged messages, 
freeing up the thread.

Anthony


> On Apr 20, 2021, at 7:31 AM, Mario Ivanac  wrote:
>
> Hi,
>
> after analysis, we  assume that proxy at reception of packets,  sends ACK on 
> TCP level, and after that moment proxy is restarted.
> This is the reason, we dont see tcp retries.
>
> Simular problem to this (but not packet loss), can be reproduce on geode,
> if on existing connection, after request is sent, tcp reset is received. In 
> that case, at reception of reset
> connection will be closed, and thread will get stuck while waiting on reply.
> I will add reproduction steps in ticket.
>
> 
> Šalje: Anthony Baker 
> Poslano: 19. travnja 2021. 22:54
> Prima: dev@geode.apache.org 
> Predmet: Re: Geode retry/acknowledge improvement
>
> Do you have a tcpdump that demonstrates the packet loss? How long did you 
> wait for TCP to retry the failed packet delivery (sometimes this can be 
> tweaked with tcp_retries2).  Does this manifest as a failed socket connection 
> in geode?  That ought to trigger some error handling IIRC.
>
> Anthony
>
>
>> On Apr 19, 2021, at 7:16 AM, Mario Ivanac  wrote:
>>
>> Hi all,
>>
>> we have deployed geode cluster in kubernetes environment, and Istio/SideCars 
>> are injected between cluster members.
>> While running traffic, if any Istio/SideCar is restarted, thread will get 
>> stuck indefinitely, while waiting for reply on sent message.
>> It seams that due to restarting of proxy, in some cases, messages are lost, 
>> and sending side is waiting indefinitely for reply.
>>
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9075data=04%7C01%7Cdarrel%40vmware.com%7C34dc38a12a744a5594a108d90beec365%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637553942381055798%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=VBtRAp6cQx1FEN6h4vBrjcqr3Rxa98JBUBc2Jfl%2F5iU%3Dreserved=0
>>
>> My question is, what is your estimation, 

Re: Odg: Geode retry/acknowledge improvement

2021-05-05 Thread Alberto Gomez
You could answer to their latest e-mail to confirm that Darrel's suspicion 
could happen. Let's see if in that case they are willing to collaborate.

Alberto

From: Mario Ivanac 
Sent: Wednesday, May 5, 2021 11:28 AM
To: dev@geode.apache.org 
Subject: Odg: Odg: Geode retry/acknowledge improvement

Hi,

I think that we have problem that Darrel was suspicious, and that some kind of 
notification could be send from peer-to-peer to acknowledge that message is 
received on receiving side.

Regarding test with ip tables, execution gets stuck with conserve-sockets set 
to false or true.

BR,
Mario

Šalje: Darrel Schneider 
Poslano: 30. travnja 2021. 18:38
Prima: dev@geode.apache.org 
Predmet: Re: Odg: Geode retry/acknowledge improvement

In the geode hang you describe would the forced tcp-reset using iptables have 
cause the put send message to fail with an exception writing it to the socket? 
If so then I'd expect the geode Connection class to keep trying to send that 
message by creating a new connection to the member. It will keep doing this 
until the send is successful or the member leaves the cluster.

But if the tcp-reset allows the send to complete, without actually sending the 
request to the other member, then geode will be in trouble and will wait 
forever for a reply. Once geode successfully writes a p2p message on a socket, 
it expects it to be processed on the other side OR it expects the other side to 
leave the geode cluster. If neither of these happen then it will wait forever 
for a response. I've wondered in the past if this was a safe expectation. If 
not then do we need to send some type of msg id and after waiting for a reply 
for too long be able to check with the member to see if it has received the 
message we think we already sent?

You might see different behavior with your iptables test if you use 
conserve-sockets=false. In that case the socket used to write the p2p message 
is also used to read the response. But in the default conserve-sockets=true 
case, the reply comes on a different socket than the one used to send the 
message. It might be hard to get the thread doing the put for gfsh to use 
conserve-sockets=false. You could try just setting that on your server and the 
stuck thread stack should look different from what you are currently seeing.

From: Anthony Baker 
Sent: Friday, April 30, 2021 8:43 AM
To: dev@geode.apache.org 
Subject: Re: Odg: Geode retry/acknowledge improvement

Can you explain the scenario further?  Does the sidecar proxy both the sending 
and receiving socket (geode creates 2 sockets for each p2p member)?  In normal 
cases, closing these sockets should clear up any unacknowledged messages, 
freeing up the thread.

Anthony


> On Apr 20, 2021, at 7:31 AM, Mario Ivanac  wrote:
>
> Hi,
>
> after analysis, we  assume that proxy at reception of packets,  sends ACK on 
> TCP level, and after that moment proxy is restarted.
> This is the reason, we dont see tcp retries.
>
> Simular problem to this (but not packet loss), can be reproduce on geode,
> if on existing connection, after request is sent, tcp reset is received. In 
> that case, at reception of reset
> connection will be closed, and thread will get stuck while waiting on reply.
> I will add reproduction steps in ticket.
>
> 
> Šalje: Anthony Baker 
> Poslano: 19. travnja 2021. 22:54
> Prima: dev@geode.apache.org 
> Predmet: Re: Geode retry/acknowledge improvement
>
> Do you have a tcpdump that demonstrates the packet loss? How long did you 
> wait for TCP to retry the failed packet delivery (sometimes this can be 
> tweaked with tcp_retries2).  Does this manifest as a failed socket connection 
> in geode?  That ought to trigger some error handling IIRC.
>
> Anthony
>
>
>> On Apr 19, 2021, at 7:16 AM, Mario Ivanac  wrote:
>>
>> Hi all,
>>
>> we have deployed geode cluster in kubernetes environment, and Istio/SideCars 
>> are injected between cluster members.
>> While running traffic, if any Istio/SideCar is restarted, thread will get 
>> stuck indefinitely, while waiting for reply on sent message.
>> It seams that due to restarting of proxy, in some cases, messages are lost, 
>> and sending side is waiting indefinitely for reply.
>>
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9075data=04%7C01%7Cdarrel%40vmware.com%7C34dc38a12a744a5594a108d90beec365%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637553942381055798%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=VBtRAp6cQx1FEN6h4vBrjcqr3Rxa98JBUBc2Jfl%2F5iU%3Dreserved=0
>>
>> My question is, what is your estimation, how much effort/work is needed to 
>> implement message retry/acknowledge logic in geode,
>> to solve this problem?
>>
>> BR,
>> Mario
>



Odg: Odg: Geode retry/acknowledge improvement

2021-05-05 Thread Mario Ivanac
Hi,

I think that we have problem that Darrel was suspicious, and that some kind of 
notification could be send from peer-to-peer to acknowledge that message is 
received on receiving side.

Regarding test with ip tables, execution gets stuck with conserve-sockets set 
to false or true.

BR,
Mario

Šalje: Darrel Schneider 
Poslano: 30. travnja 2021. 18:38
Prima: dev@geode.apache.org 
Predmet: Re: Odg: Geode retry/acknowledge improvement

In the geode hang you describe would the forced tcp-reset using iptables have 
cause the put send message to fail with an exception writing it to the socket? 
If so then I'd expect the geode Connection class to keep trying to send that 
message by creating a new connection to the member. It will keep doing this 
until the send is successful or the member leaves the cluster.

But if the tcp-reset allows the send to complete, without actually sending the 
request to the other member, then geode will be in trouble and will wait 
forever for a reply. Once geode successfully writes a p2p message on a socket, 
it expects it to be processed on the other side OR it expects the other side to 
leave the geode cluster. If neither of these happen then it will wait forever 
for a response. I've wondered in the past if this was a safe expectation. If 
not then do we need to send some type of msg id and after waiting for a reply 
for too long be able to check with the member to see if it has received the 
message we think we already sent?

You might see different behavior with your iptables test if you use 
conserve-sockets=false. In that case the socket used to write the p2p message 
is also used to read the response. But in the default conserve-sockets=true 
case, the reply comes on a different socket than the one used to send the 
message. It might be hard to get the thread doing the put for gfsh to use 
conserve-sockets=false. You could try just setting that on your server and the 
stuck thread stack should look different from what you are currently seeing.

From: Anthony Baker 
Sent: Friday, April 30, 2021 8:43 AM
To: dev@geode.apache.org 
Subject: Re: Odg: Geode retry/acknowledge improvement

Can you explain the scenario further?  Does the sidecar proxy both the sending 
and receiving socket (geode creates 2 sockets for each p2p member)?  In normal 
cases, closing these sockets should clear up any unacknowledged messages, 
freeing up the thread.

Anthony


> On Apr 20, 2021, at 7:31 AM, Mario Ivanac  wrote:
>
> Hi,
>
> after analysis, we  assume that proxy at reception of packets,  sends ACK on 
> TCP level, and after that moment proxy is restarted.
> This is the reason, we dont see tcp retries.
>
> Simular problem to this (but not packet loss), can be reproduce on geode,
> if on existing connection, after request is sent, tcp reset is received. In 
> that case, at reception of reset
> connection will be closed, and thread will get stuck while waiting on reply.
> I will add reproduction steps in ticket.
>
> 
> Šalje: Anthony Baker 
> Poslano: 19. travnja 2021. 22:54
> Prima: dev@geode.apache.org 
> Predmet: Re: Geode retry/acknowledge improvement
>
> Do you have a tcpdump that demonstrates the packet loss? How long did you 
> wait for TCP to retry the failed packet delivery (sometimes this can be 
> tweaked with tcp_retries2).  Does this manifest as a failed socket connection 
> in geode?  That ought to trigger some error handling IIRC.
>
> Anthony
>
>
>> On Apr 19, 2021, at 7:16 AM, Mario Ivanac  wrote:
>>
>> Hi all,
>>
>> we have deployed geode cluster in kubernetes environment, and Istio/SideCars 
>> are injected between cluster members.
>> While running traffic, if any Istio/SideCar is restarted, thread will get 
>> stuck indefinitely, while waiting for reply on sent message.
>> It seams that due to restarting of proxy, in some cases, messages are lost, 
>> and sending side is waiting indefinitely for reply.
>>
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9075data=04%7C01%7Cdarrel%40vmware.com%7C34dc38a12a744a5594a108d90beec365%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637553942381055798%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=VBtRAp6cQx1FEN6h4vBrjcqr3Rxa98JBUBc2Jfl%2F5iU%3Dreserved=0
>>
>> My question is, what is your estimation, how much effort/work is needed to 
>> implement message retry/acknowledge logic in geode,
>> to solve this problem?
>>
>> BR,
>> Mario
>



Re: Region data corruption due to missing PdxTypes

2021-05-05 Thread Mario Salazar de Torres
Hi,

I forgot to mention that I enabled ON_DISCONNECT_CLEAR_PDXTYPEIDS property.
Also, I tried a different scenario which does not exactly involves local 
PdxType retention, which is:

  1.  Start a cluster with 1 locator and 3 servers, and persistence is disabled 
for PdxTypes.
  2.  Setup a region called "test-region" with persistence disabled. It doesn't 
mind whether is replicated or partitioned.
  3.  In the client, instantiate the client region with PROXY region shortcut 
and establish the connection toward the cluster.
  4.  In the client, create a PdxInstance.
  5.  At this point, cluster is restarted, meaning that all the data is lost, 
included PdxTypes.
  6.  In the client, the PdxInstance created in step 4 is put into 
"test-region" with key "test".
  7.  In the client, the following query is executed: "SELECT * FROM 
/test-region WHERE value = -1".

The outcome is the same, query fails with the message "Unknown pdx 
type=" and it won't work until the corrupted entry is removed.

I don't know if you've seen this kind of scenarios before. I am just wondering 
in case this is something that needs to be fixed.

Thanks,
Mario.


From: Anthony Baker 
Sent: Wednesday, May 5, 2021 1:06 AM
To: dev@geode.apache.org 
Subject: Re: Region data corruption due to missing PdxTypes

Retaining local pox types in the client after a disconnect will cause problems 
as you observed.  Take a look at the “ON_DISCONNECT_CLEAR_PDXTYPEIDS” property 
to improve this.

Anthony


> On May 4, 2021, at 4:36 AM, Mario Salazar de Torres 
>  wrote:
>
> Hi everyone,
>
> While debugging some coredumps in the native client related to 
> PdxTypeRegistry cleanup, I tried to reproduce the scenario with the Java 
> client API to see how it was handled.
> Thing is I've noticed that this scenario in the Java client might lead to 
> Geode storing a corrupted entry, meaning that queries won't work on those 
> regions containing corrupted entries.
> And with corrupted entries, I refer to entries using a missing PdxType. The 
> scenario involves a cluster restart. It's described below:
>
>  1.  Start a cluster with 1 locator and 3 servers, and persistence is 
> disabled for PdxTypes.
>  2.  Setup a region called "test-region" with persistence disabled. It 
> doesn't mind whether is replicated or partitioned.
>  3.  In the client, instantiate the client region with PROXY region shortcut 
> and establish the connection toward the cluster.
>  4.  In the client, create a PdxInstance and put in into the "test-region" 
> with key "test".
>  5.  In the client, get the entry which key is "test", which turns out to be 
> the PdxInstance inserted in step 4.
>  6.  At this point, cluster is restarted, meaning that all the data is lost, 
> included PdxTypes.
>  7.  In the client, the PdxInstance obtained in step 5 is put into 
> "test-region" with key "test2"
>  8.  In the client, the following query is executed: "SELECT * FROM 
> /test-region WHERE value = -1".
> Such query fails with the message "Unknown pdx type=" and it 
> won't work until the corrupted entry is removed.
>
> Also, the above scenario could be solved by enabling persistence for 
> PdxTypes, but if you have an unrecoverable issue in your cluster and you need 
> to spin up a backup,
> it could happen that PdxInstance's PdxType obtained step 5 is not present in 
> the backup, leading to the entry being inserted but, yet again, the PdxType 
> being missing.
>
> It's worth mentioning that in the native client, this scenario currently 
> results in a coredump, but no data corruption,
> given that after losing the connection towards the cluster PdxTypeRegistry is 
> cleaned up and PdxTypes are obtained with its ID, rather than directly using 
> the object.
>
> My question here are:
>
>  *   Have you seen this issue before?
>  *   Is there a way to verify that PdxTypes are present in the cluster before 
> writing an entry which holds some PdxInstances?
>
> Thanks,
> Mario.