Re: [ANNOUNCE] Apache Geode 1.15.0

2022-06-22 Thread Anilkumar Gingade
Thanks to all who involved in making this happen. As done with every other 
releases, another stable robust product delivery to Geode community.

-Anil.

From: Owen Nichols 
Date: Wednesday, June 22, 2022 at 2:01 PM
To: u...@geode.apache.org , annou...@apache.org 
, dev@geode.apache.org 
Subject: [ANNOUNCE] Apache Geode 1.15.0
⚠ External Email

The Apache Geode community is pleased to announce the availability of
Apache Geode 1.15.0.

Geode is a data management platform that provides a database-like
consistency
model, reliable transaction processing and a shared-nothing architecture
to maintain very low latency performance with high concurrency processing.

Apache Geode 1.15.0 contains a number of improvements and bug fixes,
including JDK 17 support.  Users are encouraged to upgrade to this last
release.
For the full list of changes please review the release notes at:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FRelease%2BNotes%23ReleaseNotes-1.15.0data=05%7C01%7Cagingade%40vmware.com%7C219306fb0fd8492eaf9f08da54926d81%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637915285130968330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=hxjs9%2BljTWvfzxn8PQe2IgrEspp%2F2fnOsNgNgYaY6Lw%3Dreserved=0

Release artifacts and documentation can be found at the project website:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Freleases%2Fdata=05%7C01%7Cagingade%40vmware.com%7C219306fb0fd8492eaf9f08da54926d81%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637915285130968330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=6%2B90oawj5nPUh6eWUHcUrnOc4jvQwm8Qiy%2B4kVdIn2k%3Dreserved=0
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F115%2Fabout_geode.htmldata=05%7C01%7Cagingade%40vmware.com%7C219306fb0fd8492eaf9f08da54926d81%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637915285130968330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=h4hhnA2snTnpB7WEUVLrxaG5yUiViWkqm4uMNS%2FpvWI%3Dreserved=0

We would like to thank all the contributors that made the release possible.
Regards,
Owen Nichols on behalf of the Apache Geode team



⚠ External Email: This email originated from outside of the organization. Do 
not click links or open attachments unless you recognize the sender.


Re: [PROPOSAL] RFC for migrating from springfox to springdoc

2022-05-05 Thread Anilkumar Gingade
+1. Thanks for the RFC. Looks good.
Since there is no big impact; does this need to wait till May 13th on this. In 
this case is it good enough to wait for couple of approval. Say, 3 

-Anil.


From: Alexander Murmann 
Date: Thursday, May 5, 2022 at 4:09 PM
To: dev@geode.apache.org 
Subject: Re: [PROPOSAL] RFC for migrating from springfox to springdoc
Thanks for your proposal, Patrick!

I wonder if this even warrants a proposal over a PR. There seems to be neither 
a downside nor a realistic alternative.

From: Patrick Johnson 
Sent: Thursday, May 5, 2022 13:39
To: dev@geode.apache.org 
Subject: [PROPOSAL] RFC for migrating from springfox to springdoc

Hello devs!

Please review this RFC: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FMigration%2Bfrom%2Bspringfox%2Bto%2Bspringdocdata=05%7C01%7Cagingade%40vmware.com%7C2ce1ce4542c0451886ed08da2eec3edb%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637873889449292311%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=9AUVc8b0Kvs%2BoqKAMp4pC387x3DNfyaFTpIWycMdKNc%3Dreserved=0
 on migrating from springfox to springdoc for our swagger needs and provide any 
feedback you have.

Review period until Friday, May 13th.

—Patrick Johnson


Re: Question about INDEX_THRESHOLD_SIZE

2022-03-11 Thread Anilkumar Gingade
Mario,

There is similar test/example added by you in QueryWithRangeIndexDUnitTest. 
testQueryWithWildcardAndIndexOnAttributeFromHashMap()
When I run that test (on develop); I see the results as expected:
*
Command result for : 
Result  : true
Limit   : 100
Rows: 1
Query Trace : Query Executed in 85.1964 ms; indexesUsed(1):IdIndex(Results: 
1)

Are you running your test with any additional change like as you are saying :
>> I was working on allowing INDEX_THRESHOLD_SIZE System property to override 
>> CompiledValue.RESULT_LIMIT.

If so , you need to look at the change and see why its impacting that way. 
If not, please let me know what change can be made in that test to reproduce 
the issue you are seeing; that will help to debug/analyze the issue.

-Anil.




On 3/11/22, 12:18 AM, "Mario Kevo"  wrote:

Hi,

It works without an index but it doesn't work with an index.
When I revert changes, it takes INDEX_THRESHOLD_SIZE default value(100). 
And if the entry that matches the condition is not in that resultset it will 
not be printed.
Without index:
​gfsh>query --query="SELECT e.key, e.value from 
/example-region.entrySet e where e.value.positions['SUN'] like 'someth%'"
Result  : true
Limit   : 100
Rows: 1
Query Trace : Query Executed in 11.502283 ms; indexesUsed(0)

key | value
--- | 

300 | 
{"ID":300,"indexKey":0,"pkid":"300","shortID":null,"position1":{"mktValue":1945.0,"secId":"ORCL","secIdIndexed":"ORCL","secType":null,"sharesOutstanding":1944000.0,"underlyer":null,"pid":1944,"portfolioId":300,..
​With index:
gfsh>query --query="SELECT e.key, e.value from 
/example-region.entrySet e where e.value.positions['SUN'] like 'someth%'"
Result  : true
Limit   : 100
Rows: 0
Query Trace : Query Executed in 8.784831 ms; indexesUsed(1):index1(Results: 
100)
​BR,
Mario

Šalje: Anilkumar Gingade 
Poslano: 10. ožujka 2022. 23:16
Prima: dev@geode.apache.org 
Predmet: Re: Question about INDEX_THRESHOLD_SIZE

Mario,

There are few changes happened around this area as part of GEODE-9632 fix; 
can you please revert that change and see if the query works both with and 
without index.
Looking at the code; it seems to restrict the number index look up that 
needs to be performed; certain latency/throughput sensitive queries that or not 
expecting exact result may use this (my guess) but by default it should not be 
resulting in unexpected results.

-Anil.


On 3/10/22, 6:50 AM, "Mario Kevo"  wrote:

Hi geode-dev,

Some time ago I was working on allowing INDEX_THRESHOLD_SIZE System 
property to override CompiledValue.RESULT_LIMIT.
After this change, adding this attribute will take into a count if you 
set it.
But I need some clarification of this INDEX_THRESHOLD_SIZE attribute. 
Why is this set by default to 100?
The main problem with this attribute is that if you want to get the 
correct result, you need to know how many entries will be in the region while 
starting servers and set it on that value or higher. Sometimes it is too hard 
to know how many entries will be in the region, so maybe better will be to set 
it by default to some higher number, something like Integer.MAX_VALUE.

Where this attribute is used?
It is used to get index results while doing queries.

What is the problem?
If we have INDEX_THRESHOLD_SIZE set to 500, and we have 1k entries it 
can happen that while doing a query it will get only 500 entries and where 
clause cannot be fulfilled and we got no results.
Let's see it by an example!

We have only one entry that matches the condition from the query, 
INDEX_THRESHOLD_SIZE set to 500, and 1k entries in the region.
If we run the query without an index we got the result.
gfsh>query --query="SELECT e.key, e.value from 
/example-region.entrySet e where e.value.positions['SUN'] like 'someth%'"
Result  : true
Limit   : 100
Rows: 1
Query Trace : Query Executed in 10.750238 ms; indexesUsed(0)

key | value
--- | 

700 | 
{"ID":700,"indexKey":0

Re: Question about INDEX_THRESHOLD_SIZE

2022-03-10 Thread Anilkumar Gingade
Mario,

There are few changes happened around this area as part of GEODE-9632 fix; can 
you please revert that change and see if the query works both with and without 
index. 
Looking at the code; it seems to restrict the number index look up that needs 
to be performed; certain latency/throughput sensitive queries that or not 
expecting exact result may use this (my guess) but by default it should not be 
resulting in unexpected results.

-Anil.


On 3/10/22, 6:50 AM, "Mario Kevo"  wrote:

Hi geode-dev,

Some time ago I was working on allowing INDEX_THRESHOLD_SIZE System 
property to override CompiledValue.RESULT_LIMIT.
After this change, adding this attribute will take into a count if you set 
it.
But I need some clarification of this INDEX_THRESHOLD_SIZE attribute. Why 
is this set by default to 100?
The main problem with this attribute is that if you want to get the correct 
result, you need to know how many entries will be in the region while starting 
servers and set it on that value or higher. Sometimes it is too hard to know 
how many entries will be in the region, so maybe better will be to set it by 
default to some higher number, something like Integer.MAX_VALUE.

Where this attribute is used?
It is used to get index results while doing queries.

What is the problem?
If we have INDEX_THRESHOLD_SIZE set to 500, and we have 1k entries it can 
happen that while doing a query it will get only 500 entries and where clause 
cannot be fulfilled and we got no results.
Let's see it by an example!

We have only one entry that matches the condition from the query, 
INDEX_THRESHOLD_SIZE set to 500, and 1k entries in the region.
If we run the query without an index we got the result.
gfsh>query --query="SELECT e.key, e.value from 
/example-region.entrySet e where e.value.positions['SUN'] like 'someth%'"
Result  : true
Limit   : 100
Rows: 1
Query Trace : Query Executed in 10.750238 ms; indexesUsed(0)

key | value
--- | 

700 | 
{"ID":700,"indexKey":0,"pkid":"700","shortID":null,"position1":{"mktValue":1945.0,"secId":"ORCL","secIdIndexed":"ORCL","secType":null,"sharesOutstanding":1944000.0,"underlyer":null,"pid":1944,"portfolioId":700,..
​If we create an index and then run again this query there is no result.
gfsh>query --query="SELECT e.key, e.value from 
/example-region.entrySet e where e.value.positions['SUN'] like 'someth%'"
Result  : true
Limit   : 100
Rows: 0
Query Trace : Query Executed in 22.079016 ms; 
indexesUsed(1):index1(Results: 500)
​This happened because we have no luck getting that entry that matches the 
condition in the intermediate results for the index.
So the questions are:
What if more entries enter the region that will make the index return more 
entries than this threshold sets? Then we're again in jeopardy that the query 
condition will not match.
Why is this attribute set by default to 100?
Can we change it to the Integer.MAX_VALUE by default to be sure that we 
have the correct result? What are the consequences?

BR,
Mario




Re: [DISCUSS] Testing and voting on release candidates

2022-02-07 Thread Anilkumar Gingade
Dan, very good initiative. It ensures the minimum testing when someone votes, 
removing the guess factor.
Putting together a script that will cover the minimal expectation is good idea, 
keeps it easier to accomplish the task.

-Anil.
 

On 2/7/22, 7:00 AM, "Alexander Murmann"  wrote:

This is awesome! Now I am excited to try this on our next vote! 

From: Dan Smith 
Sent: Friday, February 4, 2022 10:56
To: dev@geode.apache.org 
Subject: [DISCUSS] Testing and voting on release candidates

Hi all,

I'd like to suggest something that might make voting on releases a little 
clearer and easier. I feel like we've been a bit vague about what kind of 
testing PMC members are supposed to do on a release candidate, and I see 
different folks (including myself) running different kinds of ad hoc testing.

I'd like to suggest that we should mostly focus on things that are either 
apache requirements for voting on releases or can't reasonably be testing in CI.

The apache release policy [1] says

"Before voting +1 PMC members are required to download the signed source 
code package, compile it as provided, and test the resulting executable on 
their own platform, along with also verifying that the package meets the 
requirements of the ASF policy on releases."

I checked in a script that can do the building and signature verification 
for you [2]. My hope is that we can improve this script do to all of the 
testing that we think is important to do on a developers machine before VOTING 
+1, and free up more time to look at the commits, source files etc. and 
thinking about if this is what we should be releasing.

I'm not trying to discourage any ad hoc testing someone feels like they 
want to do, but I do want to make sure that everyone is in agreement on what we 
should be doing before voting on a release and hopefully make it so that 
everyone feels comfortable voting without wondering what they are supposed to 
test.

[1] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.apache.org%2Flegal%2Frelease-policy.html%23approving-a-releasedata=04%7C01%7Cagingade%40vmware.com%7C03a59800957f49588daa08d9ea4a8fce%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637798428232854958%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=rqQ1F%2F6wg914m5R8cpi2YMOmHZMMT85fcQzXVA3NCmI%3Dreserved=0
[2] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Ftree%2Fdevelop%2Fdev-tools%2Frelease-testingdata=04%7C01%7Cagingade%40vmware.com%7C03a59800957f49588daa08d9ea4a8fce%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637798428232854958%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=R%2Fbu4eoBy4vzWyvzF16%2FOVSa1XKETR6QfMQSzC8EBcc%3Dreserved=0

Thanks,
-Dan



Re: Creating index failed

2022-02-03 Thread Anilkumar Gingade
The other problem which exists is; the case where two threads tries to create 
index with the same name with different index expression concurrently. I assume 
there are ways this could happen.
One solution to address overall issue with index creation on partitioned region 
is by taking a distributed lock with the index name.  When index creation 
request comes, it first acquires a distributed lock with the index name; any 
additional index creation with that name will be blocked till the previous 
index is created with the same name; during this time if the index creation 
comes through local or remote the exception can be ignored. As there is only 
one index creation will be in progress for the same request.

-Anil.

On 2/3/22, 4:41 AM, "Mario Kevo"  wrote:

Hi devs,

After implementing ignoring exception some tests failed as we allowed now 
to pass command again (although it does nothing as the same index is already 
created by execution before). 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull%2F7195data=04%7C01%7Cagingade%40vmware.com%7C5c8bc7454b9044a05b1308d9e71275dd%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637794888745101984%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=wUnq5WuzXrRnpeq%2FM8Ah1vF3TL8tETKxd%2B35v%2FXUMLg%3Dreserved=0


There is a summary of how it works by now.

When we are creating an index on a partitioned region, the locator sends to 
all members to create an index on all data it contains. The partitioned region 
is specific as it is normal that you want to index all data which are 
distributed on all members. That leads to every member will try to create it 
locally and send index create requests to all members on that site.
All members will check if there is an already created index or index 
creating is in progress and wait for it. In case a remotely originated request 
comes but there is already created index it will respond with Index and send an 
acknowledgment to the request sender side. In case it is not created already it 
will create an index on that member and then respond to the request sender 
side. This behavior is okay if we are using a small number of the server or 
using the --member option while creating indexes(which has no sense to use on 
the partitioned region as already described down in the mail thread).

The problem is when we are using a larger num of the servers(8 or more) or 
just with debugging on. It will slow down the whole process and then can happen 
that on some of the servers remotely originated create index request comes 
before locally request. In that case, a remotely originated request will see 
that there is no index with that name and will create a new one. But the 
problem happens after that when a local request comes and there is already 
created index it will think that it is from some execution before and throw 
IndexNameConflictException. 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fblob%2Fdevelop%2Fgeode-core%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fgeode%2Finternal%2Fcache%2FPartitionedRegion.java%23L8377data=04%7C01%7Cagingade%40vmware.com%7C5c8bc7454b9044a05b1308d9e71275dd%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637794888745101984%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=0lbpsVQ63FjPaRBvhqmhmAsYp8V0gH3BokmbASBC9hg%3Dreserved=0

The create index command will fail(despite of that the index is created on 
all data, some with local requests ad some with remotely originated requests).
There are two problems with this implementation:

  1.  The user doesn't know that the index is created and will try to 
create it again but then it will fail on all servers.
  2.  The cluster config is updated after the command is finished 
successfully, which is correct as we cannot update the cluster config before 
anything is done.
The user can use indexes despite that command failed, but the problem is 
that after the restart it has nothing in the cluster config and will not create 
an index on them.

So the question is what to do in this case? How to avoid this issue?
Ignore exceptions and fix failing tests expect that a new create index 
command will pass or disable --member option if partition region is used(or 
just document it) and don't send a request on other members as the command will 
send to all members to create it. Or maybe something else?

BR,
Mario



Šalje: Mario Kevo 
Poslano: 14. prosinca 2021. 14:06
Prima: dev@geode.apache.org 
Predmet: Odg: Creating index failed

Hi Alexandar,

The cluster config is updated at the end of the command execution, and only 
in case, the command is successful.
I created PR with Anlikumar's suggestion, but some tests failed. 

Changes made after 1.15 release branch is cut (SHA# 8f7193c827ee3198ae374101221c02039c70a561)

2022-01-27 Thread Anilkumar Gingade
Hi Team, 1.15 release manager,

Here are the list of changes that are committed after sha# 
8f7193c827ee3198ae374101221c02039c70a561 from where 1.15 release branch is cut.
If you are the owners of these changes and feel these changes needs to be in 
1.15 release; please make sure these changes are in the release branch (you may 
have to cherry-pick and backport it?) 

For Release Manager, 
Just wanted to make sure we take care of this; if this is already taken care or 
any process to add these changes are communicated, please ignore this. 

The list is generated using the command: "git log --pretty=fuller 
--since=2022-01-19"

GEODE-9853: get all members hosting bucket (#7144)
commit bbe9a3acf2f0812ef733dfe74f07fb9412c886e3
Author: Mario Ivanac <48509724+miva...@users.noreply.github.com> 
CommitDate: Thu Jan 27 11:23:29 2022 +0100

GEODE-9972: eliminate overnight step from release process (#7280)
commit bbe12c3217aa351422f3a61aa9a2cd1546a7b5db
Author: Owen Nichols <34043438+onichols-pivo...@users.noreply.github.com>
CommitDate: Thu Jan 27 00:03:08 2022 -0800

Revert "GEODE-9969: Fix unescaping the region name with underscore (#7282)" 
(#7311)
commit b7da7c25032aa921a73de3141c240b78849c334e
Author: Ray Ingles 
CommitDate: Wed Jan 26 22:23:54 2022 -0800

remove stray markdown file (#7314)
commit b33021fcbde1e92480e8759c1ed79fc1a467f2cf
Author: Dave Barnes 
CommitDate: Wed Jan 26 17:51:09 2022 -0800

GEODE-9883 Update Geode for Redis docs file (#7274)
commit 1eeccabe35466883611803ed2516145564c4cfa3
Author: Eric Zoerner 
CommitDate: Wed Jan 26 16:05:17 2022 -0800

GEODE-9973: Correct docs regarding P2P socket timeout behaviour (#7310)
commit d690f7b40e2795cdd50e59569d0c04f520089174
Author: Donal Evans 
CommitDate: Wed Jan 26 14:17:23 2022 -0800

   
roll develop to 1.16.0 now that support/1.15 has been created (#7309)
commit 0ec7f4807912645f2881f2e7ae5b9984a334f355
Author: Owen Nichols <34043438+onichols-pivo...@users.noreply.github.com>
CommitDate: Wed Jan 26 14:04:29 2022 -0800

add 1.14.3 to old versions and set as Benchmarks baseline on develop (#7307)
commit 53dae13fe02efb0f868d7d2a9ce0e03a376d679c
Author: Dick Cavender <1934150+dick...@users.noreply.github.com>
CommitDate: Wed Jan 26 11:54:49 2022 -0800

GEODE-9859: Do not copy entry if it is a destroyed entry (#7147)
commit ea91bf0f511a59b31886a137a625fdc031fb13a4
Author: Alberto Gomez 
CommitDate: Wed Jan 26 20:41:22 2022 +0100

Revert "GEODE-9973: Correct docs regarding P2P socket timeout behaviour 
(#7294)" (#7306)
commit 67d7725c818cf034670d3437173cc4ef719128f5
Author: Dave Barnes 
CommitDate: Tue Jan 25 17:03:35 2022 -0800

GEODE-9969: Fix unescaping the region name with underscore (#7282)
commit 4b5b30e379c35f17578251fe297c2b7fe7921fa4
Author: Mario Kevo <48509719+mk...@users.noreply.github.com>
CommitDate: Tue Jan 25 23:58:02 2022 +0100

GEODE-8616: Refactoring the test to remove deprecated APIs (#7301)
commit 6579b01dbfe2f7e15aaf05a4c712fcaf27498239
Author: Nabarun Nag 
CommitDate: Tue Jan 25 09:33:33 2022 -0800

GEODE-6751: Improved the tests for locator-gfsh compatibility (#7289)
commit 085b616dab895cdaa9f61f796405b11af82ffd80
Author: Nabarun Nag 
CommitDate: Mon Jan 24 17:19:49 2022 -0800

GEODE-9923: Add log message troubleshooting info from support team community 
(#7296)
commit 95210d19b5088dd0e62d4f815855d1776cd6eec7
Author: Dave Barnes 
CommitDate: Mon Jan 24 10:40:19 2022 -0800

GEODE-9958: Add gfsh-specific tests for Radish startup (#7297)
commit 77945531fafd566a0cbca7d05b5347b8ea299efc
Author: Jens Deppe 
CommitDate: Fri Jan 21 19:57:42 2022 -0800

GEODE-9922: Move Redis cross-slot checking to RegionProvider (#7295)
commit 7b0a88dbee36c6eb51513715af943f80ea6d93f9
Author: Donal Evans 
CommitDate: Fri Jan 21 17:54:03 2022 -0800

update redis svg to use new module name (#7288)
commit 66eb3f93aa44613a355515ee7d8e0bb36775d932
Author: Hale Bales 
CommitDate: Fri Jan 21 15:31:49 2022 -0800

GEODE-9885: Handle duplicated appends in Redis StringsDUnitTest (#7290)
commit b8dd86b846083a59ffe1aa56b489df60f4d75d39
Author: Donal Evans 
CommitDate: Fri Jan 21 15:28:10 2022 -0800

GEODE-9973: Correct docs regarding P2P socket timeout behaviour (#7294)
commit 763f590fe1e9a62ec21eae3a8c0cf0de2fee8594
Author: Donal Evans 
CommitDate: Fri Jan 21 15:22:39 2022 -0800

GEODE-9834: SRANDMEMBER Command Support (#7228)
commit a53c6da8dad75c8953de7e7ceb4bbfa545e5f405
Author: Kris10 
CommitDate: Fri Jan 21 10:14:07 2022 -0800

add 1.13.7 to old versions on develop (#7292)
commit a2ed24199f59f89fb87deca81280e243115f18a9
Author: Dick Cavender <1934150+dick...@users.noreply.github.com>
CommitDate: Thu Jan 20 15:51:59 2022 -0800

GEODE-9837: SUNIONSTORE Command Support (#7284)
commit 3a36962edfcd30aa3afa3a50813c63bfc155f699
Author: Kris10 
CommitDate: Thu Jan 20 14:11:41 2022 -0800

GEODE-9978: Remove test-container. (#7286)
commit 

Re: Proposal: Cutting 1.15 Release branch Tuesday, 25 January

2022-01-21 Thread Anilkumar Gingade
>  and remaining work is on processing release-blocker bugs.
Is there estimation on when this will be done?

> will allow new work to proceed
Before taking on new work; release work/issues should be prioritized; unless 
there are resources available to get the work started (apart from working on 
release issues).

If the release blocker issues are down to 1 or 2; and there is a good 
understanding on when these will be fixed/merged then it will be good idea to 
cut a release branch. 

-Anil
 

On 1/20/22, 5:43 PM, "Udo Kohlmeyer"  wrote:

+1 on cutting a release on 25 Jan 2022…

I implore all community members, that if you have a feature that has not 
met the 25 Jan 2022 deadline, to not try and add it post the deadline. This 
will help in the final stabilization phase of the release.

--Udo

From: Raymond Ingles 
Date: Friday, January 21, 2022 at 9:18 AM
To: dev@geode.apache.org 
Subject: Proposal: Cutting 1.15 Release branch Tuesday, 25 January
Hello Geode Dev Community,

We have a proposal to cut the 1.15 release branch this coming Tuesday, the 
25th of January, 2022. At this point it seems that development is 
feature-complete, and remaining work is on processing release-blocker bugs. 
Cutting the branch will allow new work to proceed without delaying the release.

Absent significant objections, the branch will be cut at some point that 
day.



Re: Query - bug fix - ServerConnection thread got stuck

2022-01-18 Thread Anilkumar Gingade
Yossi,

The issue GEM-1193 is fixed few years back on older version of Geode. It should 
be there in current versions.
Also, without much details (stack trace) here, it is hard to say if its 
GEM-1193 or something new. Can you please create a new GEODE ticket with the 
artifacts (logs, stack-trace, error message…)
Curious how are you seeing/comparing this issue to be GEM-1193.

-Anil.


From: Yossi Reginiano 
Reply-To: "u...@geode.apache.org" 
Date: Monday, January 17, 2022 at 8:05 AM
To: "'dev@geode.apache.org'" 
Cc: "'u...@geode.apache.org'" , Shadey Jabareen 
, Shivasharana Rao 
Subject: Query - bug fix - ServerConnection thread got stuck

Hi team,

We are running with geode 1.13.2 and we faced issue where ServerConnection 
thread got stuck.
In the log we saw ClosedChannelException and we suspect it is related to the 
issue reported in gemfire bug/fix GEM-1193.
How can we check if this fix exist in geode and in which versions?

Thanks,
Yossi Reginiano

This email and the information contained herein is proprietary and confidential 
and subject to the Amdocs Email Terms of Service, which you may review at 
https://www.amdocs.com/about/email-terms-of-service


Re: [DISCUSS] proposal to pare down old-version testing

2022-01-04 Thread Anilkumar Gingade
+1 
Thanks for bringing this and taking care of this.

-Anil.


On 1/3/22, 10:41 AM, "Dan Smith"  wrote:

Looking at KnownVersion.java - we did make protocol changes in 1.12.1 and 
1.13.2. So, my suggestion would be to keep 1.12.0 and 1.13.1, but dop all the 
other patch versions that aren't the latest.

-Dan

From: Dan Smith 
Sent: Monday, January 3, 2022 10:37 AM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] proposal to pare down old-version testing

+1 - this seems reasonable to me. If we do make a protocol change in a 
patch, we could potentially keep around an older patch version just in that 
specific case, but otherwise I think this makes sense.

-Dan

From: Anthony Baker 
Sent: Thursday, December 23, 2021 8:53 AM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] proposal to pare down old-version testing

Interesting data point:  40% of maven central downloads last month were for 
version 1.4.0. Of course those numbers can be easily skewed by CI bots, but 
still!

@Owen, I think your suggestion nicely improves practicality while 
continuing to support strong compatibility. In many cases it’s quite a bit 
easier to upgrade the Geode server cluster compared to potentially many, many 
client applications.  Supporting older client versions gives users time to 
upgrade, quicker access to bug fixes, and helps avoids downtime.

+1

Anthony


> On Dec 22, 2021, at 7:13 PM, Owen Nichols  wrote:
>
> Since adopting our N-2 support policy, the list of released versions in 
/settings.gradle has ballooned to over 30 entries [1].
>
> CI tests use this list to confirm that we don’t break rolling upgrade 
ability or compatibility with older clients, but some of these tests don’t seem 
to scale well: PR#7203 to add the most recent 3 releases (bringing the total to 
33) is unable to pass CI after 8 tries.
>
> Possible solutions fall into two categories: keep the full list and throw 
developers and/or more hardware at the struggling tests, or concede that 
testing every version is not a scalable approach and find ways to shorten the 
list, e.g. randomly select a subset of old versions at runtime, or manually 
pare down the list.
>
> I propose to shorten the list [2] by keeping only the latest patch for 
each minor (unless the client or server protocol version has changed, so also 
keep the patch prior to 1.12.1 and prior to 1.13.2).  As long as a patch 
release doesn’t change the client or server protocol version, I see low value 
in testing upgrades from every patch version to every future version forever.  
The months between patch releases already provide plenty of upgrade coverage on 
that specific patch, then we can move on to the next…even if there could 
somehow be a corner-case where transitive property of upgradability doesn’t 
hold, most users probably take the latest-to-latest upgrade path anyway, which 
will always be tested.
>
> Let’s keep discussion open until 3PM PST Jan 5.  In case of no response, 
I will assume lazy consensus and update settings.gradle as proposed [2].
>
>
>
> [1] Current list from 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fblob%2Fdevelop%2Fsettings.gradle%23L72-L101data=04%7C01%7Cagingade%40vmware.com%7C8b7e0a65ccc34f37b8a608d9cee8b189%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637768321075908672%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=Kla9OIPhvSC54LCeIQ%2FhrjOcpn7K39cVBHfAWJlbVh8%3Dreserved=0
 :
> 1.0.0-incubating
> 1.1.0
> 1.1.1
> 1.2.0
> 1.3.0
> 1.4.0
> 1.5.0
> 1.6.0
> 1.7.0
> 1.8.0
> 1.9.0
> 1.9.1
> 1.9.2
> 1.10.0
> 1.11.0
> 1.12.0
> 1.12.1
> 1.12.2
> 1.12.3
> 1.12.4
> 1.12.5
> 1.12.6
> 1.12.7*
> 1.13.0
> 1.13.1
> 1.13.2
> 1.13.3
> 1.13.4
> 1.13.5
> 1.13.6*
> 1.14.0
> 1.14.1
> 1.14.2*
> *=released, but not yet added to settings.gradle due to PR#7203 not able 
to pass CI due to size of version list
>
> [2] Proposed shortlist:
> 1.1.1
> 1.2.0
> 1.3.0
> 1.4.0
> 1.5.0
> 1.6.0
> 1.7.0
> 1.8.0
> 1.9.2
> 1.10.0
> 1.11.0
> 1.12.0
> 1.12.7
> 1.13.1
> 1.13.6
> 1.14.2
>




Re: Creating index failed

2021-12-07 Thread Anilkumar Gingade
The IndexTask is working as expected...To handle multiple request creating the 
same index.
It looks like, when there is multiple index creation request; It is handled 
properly (using the Future task)
What is the case when it throws Index already exists? 
The other option is if the index request matches (index type and expression) we 
should just ignore the exception; and consider index is successfully created.

-Anil.
 

On 12/7/21, 8:49 AM, "Mario Kevo"  wrote:

Are you thinking about not sending it to the remote nodes or not sending 
requests from locator to the each node?

Also, there is one map where indexTask is stored, and there is putIfAbsent 
method which seems is not working properly.
// This will return either the Index FutureTask or Index itself, based
// on whether the index creation is in process or completed.
Object ind = this.indexes.putIfAbsent(indexTask, indexFutureTask);
​In case we change it to something like:
Object ind = null;
if(!this.indexes.containsKey(indexTask)) {
  ind = this.indexes.put(indexTask, indexFutureTask);
}
​If it already has that indexTask it will not go again to run creating it, 
whether or not that index is created by remote request or locally. And in that 
case, the command will be successful and the cluster config is updated.

BR,
Mario


    Šalje: Anilkumar Gingade 
Poslano: 7. prosinca 2021. 16:41
Prima: dev@geode.apache.org 
Predmet: Re: Creating index failed

In case if you are planning to fix; the probable fix is not to send gfsh 
create command to all the nodes when its partitioned region..

On 12/7/21, 6:37 AM, "Mario Kevo"  wrote:

Hi Jason,

I agree with you that the user wanted to index all the data in the 
region when using a partitioned region. But when the command is not successful, 
the cluster config is not updated.
After the server restart, it will not have indexes as it is not stored 
in the cluster configuration.
So there should be some changes, as the index is created on all members 
but the command is not successful.
I'm working on a fix. As soon as possible I will create PR on the 
already mentioned ticket.

BR,
Mario

Šalje: Jason Huynh 
Poslano: 6. prosinca 2021. 18:45
Prima: dev@geode.apache.org 
Predmet: Re: Creating index failed

Hi Mario,

A lot of the indexing code pre-dates GFSH. The behavior you are seeing 
is when an index is created on a partition region.  When creating an index on a 
partition region, the idea is that the user wanted to index all the data in the 
region.  So the server will let all other servers know to create an index on 
the partition region.

This is slightly different for an index on a replicated region.  That 
is when the index can be created on a per member basis, which is what I think 
the -member flag is for.

GFSH however defaults to sending the create index message to all 
members for any index type from what I remember and from what is being 
described. That is why you’ll see the race condition with indexes created on 
partitioned regions but the end result being that the index that someone wanted 
to create is either created or already there.

-Jason

On 12/6/21, 6:37 AM, "Mario Kevo"  wrote:

Hi devs,

While doing some testing, I found the issue which is already 
reported there. https://issues.apache.org/jira/browse/GEODE-7875

If we run the create index command it will create an index locally 
and send a request to create an index on other members of that region.
The problem happened if the remote request comes before the request 
from the locator, in that case, a request from the locator failed with the 
following message: Index "index1" already exists.  Create failed due to 
duplicate name.

This can be reproduced by running 6 servers with DEBUG log 
level(due to this system will be slower), creating a partitioned region, and 
then creating an index.

Why does the server send remote requests to other members as they 
will get a request from the locator to create an index?
Also when running the gfsh command to create an index on one 
member, it will send create index requests to all other members. In that case, 
what is the purpose of this --member flag?

BR,
Mario






Re: Creating index failed

2021-12-07 Thread Anilkumar Gingade
In case if you are planning to fix; the probable fix is not to send gfsh create 
command to all the nodes when its partitioned region..

On 12/7/21, 6:37 AM, "Mario Kevo"  wrote:

Hi Jason,

I agree with you that the user wanted to index all the data in the region 
when using a partitioned region. But when the command is not successful, the 
cluster config is not updated.
After the server restart, it will not have indexes as it is not stored in 
the cluster configuration.
So there should be some changes, as the index is created on all members but 
the command is not successful.
I'm working on a fix. As soon as possible I will create PR on the already 
mentioned ticket.

BR,
Mario

Šalje: Jason Huynh 
Poslano: 6. prosinca 2021. 18:45
Prima: dev@geode.apache.org 
Predmet: Re: Creating index failed

Hi Mario,

A lot of the indexing code pre-dates GFSH. The behavior you are seeing is 
when an index is created on a partition region.  When creating an index on a 
partition region, the idea is that the user wanted to index all the data in the 
region.  So the server will let all other servers know to create an index on 
the partition region.

This is slightly different for an index on a replicated region.  That is 
when the index can be created on a per member basis, which is what I think the 
-member flag is for.

GFSH however defaults to sending the create index message to all members 
for any index type from what I remember and from what is being described. That 
is why you’ll see the race condition with indexes created on partitioned 
regions but the end result being that the index that someone wanted to create 
is either created or already there.

-Jason

On 12/6/21, 6:37 AM, "Mario Kevo"  wrote:

Hi devs,

While doing some testing, I found the issue which is already reported 
there. https://issues.apache.org/jira/browse/GEODE-7875

If we run the create index command it will create an index locally and 
send a request to create an index on other members of that region.
The problem happened if the remote request comes before the request 
from the locator, in that case, a request from the locator failed with the 
following message: Index "index1" already exists.  Create failed due to 
duplicate name.

This can be reproduced by running 6 servers with DEBUG log level(due to 
this system will be slower), creating a partitioned region, and then creating 
an index.

Why does the server send remote requests to other members as they will 
get a request from the locator to create an index?
Also when running the gfsh command to create an index on one member, it 
will send create index requests to all other members. In that case, what is the 
purpose of this --member flag?

BR,
Mario





Re: API check error when adding a new method to a public interface

2021-11-23 Thread Anilkumar Gingade
Alberto,

I don’t think the intention is to avoid, discourage adding a new method...As 
you have seen any changes to the API or adding a new API has implications on 
other parts of the product, it is good to validate/verify and address the 
dependency across the product and get everything working in accordance (without 
breaking any compatibility). If you have any requirement please propose through 
RFC and get an approval.

-Anil. 

On 11/23/21, 8:44 AM, "Alberto Gomez"  wrote:

Hi,

After the introduction of GEODE-9702 
(https://issues.apache.org/jira/browse/GEODE-9702), adding a new method to a 
public interface will make the api-check-test-openjdk11 fail even if a default 
implementation is provided.

My question is if the goal of this change is to forbid this type of changes 
in minor versions or if there is a process to follow in order for changes of 
this type to be added.

I wanted to propose (in an RFC) the addition of a new parameter to the 
create gateway sender command that would require adding a new method to the 
GatewaySender interface as well as to other public interfaces and I was 
wondering if this will be possible at all, and if so, how should I proceed with 
it.

Thanks,

Alberto




Re: PROPOSAL: Remove WAN TX Batching Serialization Changes

2021-09-21 Thread Anilkumar Gingade
+1.
Is the idea just creating the Jira tickets? It is not clear from here, if it 
will be owned and completed by 1.15.

-Anil.


On 9/21/21, 2:13 PM, "Jacob Barrett"  wrote:

Devs,

In addition to my discussion regarding the modularization of the WAN TX 
batching implementation I would like to propose that we remove the 
serialization changes that went into 1.14 to support it. Since the feature is 
not complete in 1.14 this should only impact the associated tests in 1.14. I 
want to do this to eliminate the necessary serialization of the of the 
transaction ID and last event flags as well as the boolean to determine if 
there is a transaction ID. As implemented right now this data is serialized for 
both WAN and AEQ sender events that are part of a transaction regardless of the 
enablement of TX batching on the sender. The transaction ID contains both the 4 
byte counter and large membership ID.

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fblob%2Fdevelop%2Fgeode-core%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fgeode%2Finternal%2Fcache%2Fwan%2FGatewaySenderEventImpl.java%23L712data=04%7C01%7Cagingade%40vmware.com%7Ce671f8dae742430818a208d97d44b068%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637678556236321549%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=D6VAaF1JV8jRz7ggbwMn0nB9W8x5Xug6dAvOyGxJnKY%3Dreserved=0

Since this went out in 1.14.0 the removal would be treated like any other 
upgrade to the protocol and a 1.14.1 version would not read or write any of 
those bites. When talking to exactly a 1.14.0 version the implementation would 
write only the false flag and read the flag and ignore the rest as necessary. 
The tests related to TX batching would also need to be disabled.

Something like this:

  public void toData(DataOutput out,
  SerializationContext context) throws IOException {
// intentionally skip 1.14.0
toDataPre_GEODE_1_14_0_0(out, context);
  }

  public void toDataPre_GEODE_1_14_1_0(DataOutput out,
  SerializationContext context) throws IOException {
toDataPre_GEODE_1_14_0_0(out, context);
DataSerializer.writeBoolean(false);
  }

  public void fromData(DataInput in, DeserializationContext context)
  throws IOException, ClassNotFoundException {
fromDataPre_GEODE_1_14_1_0(in, context);
  }

  public void fromDataPre_GEODE_1_14_1_0(DataInput in, 
DeserializationContext context)
  throws IOException, ClassNotFoundException {
fromDataPre_GEODE_1_14_0_0(in, context);
if (version == KnownVersion.GEODE_1_14_0.ordinal()) {
  if (hasTransaction) {
DataSerializer.readBoolean(DataSerializer.readBoolean(in));
context.getDeserializer().readObject(in);
  }
}
  }

I would also propose that if 1.15.0 looks like it will ship without the 
modularization changes that we at least address the serialization changes here 
in a way that does not affect all gateways, WAN or AEQ.

If accepted I will write up two JIRAs, one to address the 1.14 removal and 
the other as a blocker on 1.15 to address the serialization issues.

Ok, chime in!

-Jake




Re: [ANNOUNCE] Apache Geode 1.14.0

2021-09-03 Thread Anilkumar Gingade
Great work team. Thanks Naba and others who helped in getting the release out.

Anil.


On 9/3/21, 11:58 AM, "nabarun nag"  wrote:

The Apache Geode community is pleased to announce the availability of
Apache Geode 1.14.0

Apache Geode is a data management platform that provides a database-like
consistency model, reliable transaction processing and a shared-nothing
architecture to maintain very low latency performance with high concurrency
processing.

This release includes a significant number of bug fixes, improvements
in current behavior along with the addition of a few statistics to
monitor the cluster health.

Few notable changes are:

1. The creation of OQL indexes now works on sub-regions.
2. Proper exceptions are thrown when a region is destroyed during
function execution.
3. Daemon threads are now used while rebalancing regions.
4. Gateway receivers can be configured with the same
hostname-for-senders and port. The reason for such a setup is
deploying a Geode cluster on a Kubernetes cluster where all GW
receivers are reachable from the outside world on the same IP and
port.
5. Disk stores are recovered in parallel during cluster restarts.
6. New option in GFSH command "start gateway sender" to control
clearing of existing queues.
7. New member field added in OQL query GFSH command to point to the
member on which the query will be executed.
8. No more ConcurrentModificationException when using JTA transactions.
9. Setting SNI server name is now not needed if endpoint verification
is disabled.
10. A new REST interface for disk-store creation has been introduced.
11. GFSH command to create defined indexes now works if connected to a
new locator which joined the cluster after indexes were defined.
12. Session state modules dependencies were cleaned up and made more 
efficient.
13. Limited retries while trying to create Lucene indexes to prevent
stack overflow issues.
14. A new statistic was added to get the heap memory occupied by the
gateway sender's queue.
15. maximum-time-between-pings set when creating a gateway receiver is
now honored instead of being ignored.
16. Deadlocks are prevented when java garbage collection and tombstone
collection occur simultaneously.
17. 'conserve-sockets' default value is now set to false when the
members are started.
18. Slower receivers with async-distribution-timeout greater than 0
are now not allowed with cluster TLS/SSL.
19. Clients trying to register interest in an older version server
will now receive a ServerRefusedConnectionException.
20. The speed of registering interest during rolling upgrades has been 
improved.
21. A new feature was added to print out the tenured heap in the log
files after garbage collection.
22. Bucket statistics were fixed.

For the full list of changes please review the release notes:

https://cwiki.apache.org/confluence/display/GEODE/Release+Notes#ReleaseNotes-1.14.0

The release artifacts can be downloaded from the project website:
https://geode.apache.org/releases/

The release documentation is available at:
https://geode.apache.org/docs/guide/114/about_geode.html

We would like to thank all the contributors that made the release possible.
Regards,
Nabarun Nag on behalf of the Apache Geode team



Re: "create region" cmd stuck on wan setup

2021-07-28 Thread Anilkumar Gingade
The recommendation with WAN setup is:
- Create/start WAN Senders first
- Create Regions
- Create/Start WAN receivers last
 
That way when wan receiver is started; the regions are created on all the 
sites. Sorry, I have not looked at your scripts...

-Anil.



On 7/28/21, 3:31 AM, "Alberto Bustamante Reyes" 
 wrote:

Hi Geode devs,

I have been analyzing an issue that occurs in the following scenario:

1) I start two Geode clusters (cluster1 & cluster2) with one locator and 
two servers each.
Both clusters host a partitioned region called "testregion", which is 
replicated using a parallel gateway sender and a gateway receiver.
These are the gfsh files I have been using for creating the clusters: 
https://gist.github.com/alb3rtobr/e230623255632937fa68265f31e97f3a

2) I run a client connected to cluster2 performing operations on testregion.

3) cluster1 is stopped and all persistent data is deleted. And then, I 
create cluster1 again.

4) At this point, the command to create "testregion" get stuck.


After checking the thread stack and the code, I found that the problem is 
the following.

This thread is trapped on an infinite loop waiting for a bucket primary 
election at "PartitionedRegion.waitForNoStorageOrPrimary":


"Function Execution Processor4" tid=0x55
java.lang.Thread.State: TIMED_WAITING
at java.base@11.0.11/java.lang.Object.wait(Native Method)
-  waiting on org.apache.geode.internal.cache.BucketAdvisor@28be7ae0
at 
app//org.apache.geode.internal.cache.BucketAdvisor.waitForPrimaryMember(BucketAdvisor.java:1433)
at 
app//org.apache.geode.internal.cache.BucketAdvisor.waitForNewPrimary(BucketAdvisor.java:825)
at 
app//org.apache.geode.internal.cache.BucketAdvisor.getPrimary(BucketAdvisor.java:794)
at 
app//org.apache.geode.internal.cache.partitioned.RegionAdvisor.getPrimaryMemberForBucket(RegionAdvisor.java:1032)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.getBucketPrimary(PartitionedRegion.java:9081)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.waitForNoStorageOrPrimary(PartitionedRegion.java:3249)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.getNodeForBucketWrite(PartitionedRegion.java:3234)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.shadowPRWaitForBucketRecovery(PartitionedRegion.java:10110)
at 
app//org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderQueue.java:564)
at 
app//org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderQueue.java:443)
at 
app//org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderEventProcessor.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderEventProcessor.java:195)
at 
app//org.apache.geode.internal.cache.wan.parallel.ConcurrentParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ConcurrentParallelGatewaySenderQueue.java:183)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.postCreateRegion(PartitionedRegion.java:1177)
at 
app//org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3050)
at 
app//org.apache.geode.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2910)
at 
app//org.apache.geode.internal.cache.GemFireCacheImpl.createRegion(GemFireCacheImpl.java:2894)
at 
app//org.apache.geode.cache.RegionFactory.create(RegionFactory.java:773)


After creating testregion, the sender queue partitioned region is created. 
While that region buckets are recovered the command is trapped on an infinite 
loop waiting for a primary bucket election at 
PartitionedRegion.waitForNoStorageOrPrimary.

This seems to be a known issue because in 
PartitionedRegion.getNodeForBucketWrite, there is the following command before 
calling waitForNoStorageOrPrimary (and the command has been there since Geode's 
first commit!) :

// Possible race with loss of redundancy at this point.
// This loop can possibly create a soft hang if no primary is ever 
selected.
// This is preferable to returning null since it will prevent obtaining 
the
// bucket lock for bucket creation.
return waitForNoStorageOrPrimary(bucketId, "write");

Any idea about why the primary bucket is not elected?

It seems the failure is related with the fact that "testregion" is 
receiving updates from the receiver before the "create region" command has 
finished. If the test is repeated without traffic on cluster2 or if I create 
the cluster1's receiver after creating "testregion", this problem is not 
happening.

Is there any recommendation on the startup order of regions, senders and 

Re: Issue while upgrading from Gemfire to Geode 1.13.2

2021-06-03 Thread Anilkumar Gingade
Can you be more specific about the GemFire version; Is it a 
supported/enterprise GemFire version? As for as I know Geode community has 
never tried upgrading a GemFire version to Geode.

On 6/3/21, 1:34 PM, "Jehu Jair RuizVillegas" 
 wrote:

Hi team

We upgrading from Gemfire to Geode 1.13.2, this is mostly a technical 
upgrade and we are facing some exception while trying to bring up JVMs, below 
the exception:

Exception in thread "main" java.lang.ExceptionInInitializerError

at 
amdocs.imdg.statistics.LatencyManagerFactory.getLatencyManager(LatencyManagerFactory.java:20)

at 
amdocs.imdg.functions.GetRTNotifications.(GetRTNotifications.java:33)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:348)

at 
org.apache.geode.internal.ClassPathLoader.forName(ClassPathLoader.java:201)

After investigation using jdb, we found that one of the classes that was 
supposed to be initialized during the initialization process at the start up of 
JVM was not called,

  
  
  1
  
  
  
  
amdocs.imdg.initializations.Startup
  
  
  

This is how our cache.xml is defined, this Startup class is not been called 
when upgrading to Geode. We set some break points in the startup's constructor 
an is not been called.

Thanks & regards.
This email and the information contained herein is proprietary and 
confidential and subject to the Amdocs Email Terms of Service, which you may 
review at 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amdocs.com%2Fabout%2Femail-terms-of-servicedata=04%7C01%7Cagingade%40vmware.com%7C2f2ed3398df64ff53b3508d926cef202%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637583492539678573%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=kTatOeY41purs0enpxdrFG0%2BgEvOoX%2BJKos2Jdt%2BzTg%3Dreserved=0
 




Re: [Discuss] New feature work approval state in Geode project?

2021-05-28 Thread Anilkumar Gingade
My thoughts; I can't make distinction between feature or bug; it’s a change to 
the codebase, if it has greater impact, is sensitive and takes time to build; 
then it is a candidate to bring it up and talk about it before implementation. 
Sometime its hard to determine/distinguish it, we developer should make a good 
judgement of it and be open for any suggestion. Currently we have a RFC 
process; adding more process/steps adds additional onus; we can tweak RFC 
process or replace it with better option but let's keep it simple/minimal.


On 5/28/21, 11:57 AM, "Jacob Barrett"  wrote:



> On May 28, 2021, at 11:24 AM, Mark Hanson  wrote:
> 
> I think the key difference between what Bill and Jake are saying is that 
Bill is saying a new feature needs approval in a more structured way. I think 
Bill's process is open the jira, then it is "approved" or "won't do" then work 
starts. I think what Jake is saying is a little less structured. That may be my 
reading though.

The difference is between bugs vs features. We have a process for features, 
lazy consensus on RFCs. We have a process for minor 
features/improvements/tasks, greedy concensus on PR approvals. 

-Jake




Re: [Discuss] New feature work approval state in Geode project?

2021-05-28 Thread Anilkumar Gingade
Can you be more elaborate on this...
Are we saying; if I (geode dev) find a bug or feature that I need for my 
application, I need to get approval to create a ticket and work on it? We 
already have RFC process, won't that suffice...

-Anil.

On 5/28/21, 10:36 AM, "Mark Hanson"  wrote:

Hi All,

There has been some discussion about adding a new state of approved to 
Geode Jira for features or something like it, to help prevent work being done 
that doesn’t make it into the project. What to people think?

Thanks,
Mark



Re: Cleaning up the codebase - use curly braces everywhere

2021-05-27 Thread Anilkumar Gingade
+1

Instead of big merge; can this be done at package level; just a thought.

-Anil.  

On 5/27/21, 10:51 AM, "Dale Emery"  wrote:

We might also use IntelliJ to enforce any guidelines that we want to 
enforce. You can run inspections on the command line: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jetbrains.com%2Fhelp%2Fidea%2Fcommand-line-code-inspector.htmldata=04%7C01%7Cagingade%40vmware.com%7C0e6c904d94d2441787a908d921381910%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637577347090159625%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=dYh4JefNS%2FspVSOLmOHh6zi9jiq3sS9Cs0GWMrn18sI%3Dreserved=0

An advantage of using IntelliJ inspections is that we can provide an 
inspection profile that treats violations as errors. Then developers can use 
this profile while editing to spot violations immediately as they’re introduced.

A disadvantage is that this somewhat couples Geode to a particular IDE.

Dale

From: Donal Evans 
Date: Thursday, May 27, 2021 at 10:22 AM
To: dev@geode.apache.org 
Subject: Cleaning up the codebase - use curly braces everywhere
Hi Geode dev,

I've recently been looking at ways to improve code quality in small ways 
throughout the codebase, and as a starting point, I thought it would be good to 
make it so that we're consistently using curly braces for control flow 
statements everywhere, since this is something that's specifically called out 
in the Geode Code Style Guide wiki page[1] as one of the "more important 
points" of our code style.

IntelliJ has a "Run inspection by name..." feature that makes it possible 
to identify all places where curly braces aren't used for control flow 
statements, (which showed over 3300 occurrences in the codebase) and also 
allows them to be automatically inserted, making the fix relatively trivial. 
Since this PR will touch 640 files, I wanted to make sure to first check that 
this is something even worth doing, and, if there's agreement that it is, to 
give reviewers context on what the changes are, the motivation for them, and 
how they were made, to help with the review process.

The draft PR I have up[2] currently has no failing tests and can be marked 
as ready to review if there aren't any objections, and once it is, I'll try to 
coordinate with codeowners to get the minimal number of approvals required for 
a merge (it looks like only 6-7 reviewers are needed, though I'm sure that 
almost every code owner will be tagged as reviewers given the number of files 
touched).

If this idea is a success, I think it would be good to have a discussion 
about other low-hanging code improvements we could make using static analysis 
(unnecessary casts, unused variables, duplicate conditions etc.), and, once a 
particular inspection has been "fixed," possibly consider adding a check for it 
as part of the PR pre-checkin to make sure it's not reintroduced. All thoughts 
and feedback are very welcome.

Donal

[1] https://cwiki.apache.org/confluence/display/GEODE/Code+Style+Guide
[2] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull%2F6523data=04%7C01%7Cagingade%40vmware.com%7C0e6c904d94d2441787a908d921381910%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637577347090159625%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=BD2XFs4hc2qcQS9dl9k7wd3EVkqoD5Cm44tDqrzWhn0%3Dreserved=0



[DISCUSS] Pull Request (PR) check list

2021-04-08 Thread Anilkumar Gingade
It is been some time we have been using a standard check-list for PRs. It may 
be time to look back and see if any of them were obsolete; and add new items 
based on the PR review experience.

Current PR check list items:

  1.  Is there a JIRA ticket associated with this PR? Is it referenced in the 
commit message?
  2.  Has your PR been rebased against the latest commit within the target 
branch (typically develop)?
  3.  Is your initial contribution a single, squashed commit?
  4.  Does gradlew build run cleanly?
  5.  Have you written or updated unit tests to verify your changes?
  6.  If adding new dependencies to the code, are these dependencies licensed 
in a way that is compatible for inclusion under ASF 2.0?
Based on how the PRs are created and with review requirement/process in place, 
the check-list from #1 - #5 seems to be obsolete.

  *   Ticket numbers are used/referred in PR
  *   The merging option shows if there is any conflict with base repo
  *   Unit tests are run on the PR and reviewers are expected to look for 
new/existing tests to be added/modified.

Adding some of the criteria we often miss out during the code changes could be 
more valuable to add into the checklist, e.g.:

  *   Any serialization changes done; requiring 
backward-compatibility/rolling-upgrade testing
  *   Is there any performance implication with these changes
  *   Was there any RFC for this PR and needs to be updated (link to the RFC)


Please share your thoughts/comments.

Thanks,
-Anil



Re: [Proposal] Backport GEODE-9016 to 1.14, 1.13 and 1.12

2021-03-11 Thread Anilkumar Gingade
+1 to backport. 

On 3/11/21, 11:38 AM, "Jianxia Chen"  wrote:

Hi,

I would like to backport the fix of GEODE-9016 to Geode 1.14, 1.13 and 1.12
branches. This would help resolve the NPE for certain cases of PutAll with
CQ.

Thanks,
Jianxia



Re: [DISCUSS] client/server communications and versioning

2021-02-23 Thread Anilkumar Gingade
Bruce,
>> To solve that problem we currently have to issue a new 1.13 release that 
>> knows about v1.12.1 and users have to roll their servers to the new v1.13.1.
Even if we introduce the client protocol version, the users still need to 
upgrade to server version, that understands the protocol right? E.g. 1.13 may 
not understand 1.8.1 (or 1.9) protocol.

Also, currently we have client and server versions; this will introduce one 
more versioning requirement (with respect to client/server messaging protocol); 
I am just wondering will there be any additional work/complexities in knowing, 
maintaining client/server version and the messaging protocol version with 
them... 

-Anil.

On 2/23/21, 1:56 PM, "Dan Smith"  wrote:

Ha, I was thinking of suggesting this when I saw Alberto's earlier 
proposal. This does seem like a good idea to only bump the client version when 
the protocol actually changes.

One concern is that it might not be obvious that changing a 
DataSerializableFixedId will change the client protocol. Some objects get sent 
or received from the client and some don't, but we don't have a clear 
indication which is which. Is there some way that we could know when changing a 
DataSerializableFixedId if it is involved in the client protocol or not?

I also wonder if this will affect the WAN - do we want to keep sending the 
current product version with the WAN, or use the client protocol version?

-Dan

From: Bruce Schuchardt 
Sent: Tuesday, February 23, 2021 9:38 AM
To: dev@geode.apache.org 
Subject: [DISCUSS] client/server communications and versioning

I’m considering a change in client/server communications that I would like 
feedback on.

We haven’t changed on-wire client/server communications since v1.8 yet we 
tie these communications to the current version.  The support/1.14 branch 
identifies clients as needing v1.14 for serialization/deserialization, for 
instance, even though nothing has changed in years.

If we put out a patch release, say v1.12.1, clients running that patch 
version cannot communicate with servers running v1.12.0.  They also can’t 
communicate with a server running v1.13.0 because that server doesn’t know 
anything about v1.12.1 and will reject the client.  To solve that problem we 
currently have to issue a new 1.13 release that knows about v1.12.1 and users 
have to roll their servers to the new v1.13.1.

I propose to change this so that the client’s on-wire version is decoupled 
from the “current version”.  A client can be running v1.14.0 but could use 
v1.8.0 as its protocol version for communications.

This would have an impact on contributors to the project.  If you need to 
change the client/server protocol version you will need to modify 
KnownVersion.java to specify the change, and should let everyone know about the 
change.

See 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8963data=04%7C01%7Cagingade%40vmware.com%7Cb2798fe5be1a401bfa5708d8d845ea70%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637497142090368767%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=9Tr2x%2BWD3s43TvoQNtsyqW0rwhNU5GOK8PxxEjw1mb8%3Dreserved=0



Adding 1.14 blocker label for GEODE-8671

2021-02-19 Thread Anilkumar Gingade
We are investigating GEODE-8671; while investigation is in progress we like to 
treat this as a 1.14 blocker.

https://issues.apache.org/jira/browse/GEODE-8671

Thanks,
-Anil.




Re: [DISCUSSION] Should We Backport Publishing of Geode Tomcat Module

2021-01-11 Thread Anilkumar Gingade
Is there a user request to use this in an older version?
How easy is it to backport?
From the comments, it looks like it is needed for Geode artifacts published to 
Maven? Is this true?

If there is no user request, and there is other way to include Tomcat session, 
my view is to not backport, but I am not expert in this area, if there is 
recommendation to backport, I am fine to. And if it has to be backported, it 
should be on the version widely used and above(say 1.10 and above, again 
depending on how easy to backport)...

-Anil.


On 1/11/21, 10:10 AM, "Sarah Abbey"  wrote:

Hey, Geode Devs!

Ben Ross and I are currently working on session state in Geode.  In order 
to include the Geode Tomcat session module in the Geode artifacts published to 
Maven, we had to update the module so it publishes to Maven (as seen in this 
PR
 and this 
PR).
  This change is only on the current develop branch.  We are now wondering if 
these changes should be backported to older versions of Geode.

What does everyone think?  If it should be backported, to which versions 
should it be backported?

Thank you,
Sarah



Re: [DISCUSS] Geode 1.14

2021-01-04 Thread Anilkumar Gingade
My recommendation will be:
- Identify, Prioritize, Merge 1.14 related work
- Stabilize. Cut the branch and Stabilize again (to test any new changes added 
during first stabilize period)

-Anil.
 

On 12/18/20, 2:26 PM, "Mark Hanson"  wrote:

I support the cut on a predetermined date. But I will be ok with the 
Stabilize first approach, because I think that having a stable build is a 
prerequisite for any time based model. But like all things, this is a smell 
that we have to do this... The other thing is that specifying a date or a 
window of time in my opinion is crucial to ensuring freshly baked features are 
not merged until we cut the release. The window need not be very long a day or 
two as an example. With the volume of defects that we need to assess/fix 
maintaining control of develop seems important.  So I would propose that we 
give notice of when we are looking to cut the branch (once we have made 
adequate determinations for the defects).

Thanks,
Mark

On 12/18/20, 12:09 PM, "Owen Nichols"  wrote:

To summarize this thread so far:
@Robert and @Jens seem to favor “cut then stabilize”
@Alexander and @John seem to favor “stabilize then cut”
No one seems to favor “cut on a predetermined date” (at least for 1.14)

@John also made a creative suggestion that maybe 1.14 doesn’t have to 
be cut from latest develop…what if we cut it from support/1.13 and then 
backport just the redis changes (in parallel with continuing to stabilize 
what’s currently on develop into a 1.15 release).

For now let’s try to proceed on the “stabilize then cut” plan.  All 
committers, please hold off on merging big refactorings or other high-risk 
changes to develop until after the branch is cut.  Let’s regroup next month and 
try to clarify exactly which GEODE Jira tickets we need to focus on to make 
sure 1.14 is our best release.

From: Owen Nichols 
Date: Tuesday, December 1, 2020 at 12:26 PM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] Geode 1.14
If someone wants to propose a list of must-fix Jira tickets before we 
can cut the branch, I see that as a shift from a time-based to feature-based 
branch-cut strategy.  Might be fun to try?

Given the distributed nature of the Geode community, picking a date and 
sticking to it allows decentralized decision-making (each contributor can plan 
on their own what they can finish and/or how they can help get develop as 
stable as possible by that date).

To answer your question: the current state of develop feels “pretty 
good” to me.  Knowing that only critical fixes will be allowed onto the branch 
once cut, the question is really about features.  It sounds like there is redis 
work we’d like to ship.  Anything else nearly-done we should considering 
waiting on?

From: Alexander Murmann 
Date: Monday, November 30, 2020 at 11:57 AM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] Geode 1.14
Hi all,

Thanks, Owen for reminding us all of this topic!

I wonder how we feel about the state of develop right now. If we cut 
1.14 right now, it will make it easier to stabilize and ship it. However, I see 
21 open JIRA tickets affecting 1.14.0. It might be better to have an all-hands 
effort to address as much as possible on develop and then cut 1.14. If we shift 
all attention to 1.14, develop will likely never get better. I'd love to get 
closer to an always shippable develop branch. That should vastly reduce future 
release pain and make everyday development better as well.

Thoughts?

From: Jens Deppe 
Sent: Wednesday, November 25, 2020 20:11
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] Geode 1.14

Hi Owen,

Thanks for starting this conversation and especially for volunteering 
as Release Manager!

Since we're already a couple of quarters 'behind', in terms of 
releases, I'd prefer cutting the 1.14 branch ASAP. Leaving it until Feb means 
we'll have 9 months of changes to stabilize. How long might that take to 
finally get shipped? (rhetorical).

--Jens

On 11/25/20, 6:05 PM, "Owen Nichols"  wrote:

The trigger in @Alexander’s July 28 proposal to postpone 1.14 has 
been met (we shipped 1.13).
It’s time to discuss when we want to cut the 1.14 branch.  I will 
volunteer as Release Manager.

Below are all release dates since Geode adopted a time-based 
release cadence.

Minor releases:
1.13   branch cut May 4 2020, 1.13.0 shipped Sep 9 2020
1.12   branch cut Feb 4 20201.12.0 shipped Mar 31 2020
1.11   branch cut Nov 4 2019   1.11.0 shipped Dec 31 2019
1.10   branch cut Aug 2 2019   1.10.0 shipped Sep 26 2019
1.9  branch cut Feb 19 2019 1.9.0 shipped Apr 

Re: create region and clear entries from geode-client

2020-12-17 Thread Anilkumar Gingade
The doc you are pointing; tries to create region using Functions.
You can use the samples given there and modify as per your requirement or 
create a new one.

Here is reference doc about function execution:
https://geode.apache.org/docs/guide/14/developing/function_exec/function_execution.html

The clear() op is supported on replicated region but not on partitioned region. 
You can create a function to achieve the clear (similar to creating region) 
either by calling clear or deleting the region entries by iterating over them.

-Anil.


On 12/17/20, 4:51 AM, "ankit Soni"  wrote:

Hello,

I have started using geode (1.12) in recent time and  looking to achieve
the following from a *java based geode-client program.*

 1.* create REPLCIATED and PARTITION regions from the client itself and
while creating it, need to provide a config that deletes the entries at
specified time.*

 2. *clear entries of a region from the client..*

I am exploring geode doc and have come across the following but unable to
understand what code i need to write in the geode-client java program.

https://geode.apache.org/docs/guide/14/developing/region_options/dynamic_region_creation.html

Can some guide me on how to achieve this...?

-Ankit.



Re: [PROPOSAL] backporting GEODE-8764 to 1.13 and 9.10

2020-12-04 Thread Anilkumar Gingade
Gester, You mentioned 9.10; you mean 1.12 geode?

+1 for backporting.

-Anil.


On 12/3/20, 10:44 PM, "Xiaojian Zhou"  wrote:

GEODE-8764 is an enhanced version of GEODE-6930.

Lucene functions should only require DATA:READ permission on the specified 
region, no need to gain permission on other unrelated regions.

The fix has no risk.

Regards
Xiaojian Zhou



Re: [PROPOSAL] Change the default value of conserve-sockets to false

2020-12-04 Thread Anilkumar Gingade
ver1:53948):41001]

2. The ServerConnection thread in server2 uses the shared Connection to 
server1:

ServerConnection on port 60463 Thread 1: ConnectionTable.get using 
sharedConnection=192.168.1.8(server1:53948):41001(uid=3); 
socket=Socket[addr=/192.168.1.8,port=56562,localport=60458]; time=1607039137587

3. The shared P2P message reader in server1 handles the 
UpdateWithContextMessage and sends the ReplyMessage using the shared Connection 
to server2 even though conserve-sockets=false:

P2P message reader for 192.168.1.8(server2:53949):41002 shared ordered 
uid=3 local port=56562 remote port=60458: TestDistributionMessageObserver 
operation=beforeProcessMessage; time=1607039137588; 
message=UpdateOperation$UpdateWithContextMessage(...); recipients=[null]
P2P message reader for 192.168.1.8(server2:53949):41002 shared ordered 
uid=3 local port=56562 remote port=60458: TestDistributionMessageObserver 
operation=beforeSendMessage; time=1607039137588; message=ReplyMessage 
processorId=42 from null; recipients=[192.168.1.8(server2:53949):41002]
P2P message reader for 192.168.1.8(server2:53949):41002 shared ordered 
uid=3 local port=56562 remote port=60458: ConnectionTable.get using 
sharedConnection=192.168.1.8(server2:53949):41002(uid=2); 
socket=Socket[addr=192.168.1.8/192.168.1.8,port=46868,localport=60454]; 
time=1607039137588
P2P message reader for 192.168.1.8(server2:53949):41002 shared ordered 
uid=3 local port=56562 remote port=60458: TestDistributionMessageObserver 
operation=afterProcessMessage; time=1607039137589; 
message=UpdateOperation$UpdateWithContextMessage(...); recipients=[null]

4. The shared P2P message reader in server2 handles the ReplyMessage:

P2P message reader for 192.168.1.8(server1:53948):41001 shared 
unordered uid=2 local port=46868 remote port=60454: 
TestDistributionMessageObserver operation=beforeProcessMessage; 
time=1607039137589; message=ReplyMessage processorId=42 from 
192.168.1.8(server1:53948):41001; recipients=[null]
P2P message reader for 192.168.1.8(server1:53948):41001 shared 
unordered uid=2 local port=46868 remote port=60454: 
TestDistributionMessageObserver operation=afterProcessMessage; 
time=1607039137589; message=ReplyMessage processorId=42 from 
192.168.1.8(server1:53948):41001; recipients=[null]




From: Bruce Schuchardt 
Sent: Thursday, December 3, 2020 8:18 AM
To: dev@geode.apache.org 
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to 
false

+1 for having the default be conserve-sockets=false.   Any time there has 
been trouble and conserve-sockets=true is involved we always suggest changing 
it to false.


On 12/3/20, 6:58 AM, "Anilkumar Gingade"  wrote:

I was conversing with few of the dev's about requirement of different 
settings/configuration for set of nodes in the cluster depending on the 
business/application needs; for example set of nodes serving different kind of 
application requirement (data store) than other nodes in the cluster 
(computation heavy). I am calling this as heterogeneous cluster configuration 
(mostly in large cluster) compared to homogeneous cluster (same config across 
all the nodes). We need to be thinking both kind of deployment as the business 
models are moving towards cloud based services more and more for the entire org.
We need to be thinking about auto setting of configuration values 
(dynamic) based on the load, resource availability and service agreements. We 
should plan taking few of these settings and build a logic where these can be 
automatically adjusted.

Sorry for diverting from the actual email thread subject.

Barry, it’s a great find. Will there be dedicated channel for 
communication from the node where conserve-socket is set to false to the remote 
nodes.

 -Anil.

On 12/2/20, 3:14 PM, "Barrett Oglesby"  wrote:

I ran a bunch of tests using the long-running-test code where the 
servers had a mix of conserve-sockets settings, and they all worked ok.

One set of tests had 6 servers - 3 with conserve-sockets=false and 
3 with conserve-sockets=true.

Another set of tests had 4 servers - 3 with conserve-sockets=false 
and 1 with conserve-sockets=true.

In each case, the multi-threaded client did:

- puts
- gets
- destroys
- function updates
- oql queries

One thing I found interesting was the server where the operation 
originated dictated which thread was used on the remote server. If the server 
where the operation originated had conserve-sockets=false, then the remote 
server used an unshared P2P message reader to process the replication no matter 
what its conserve-sockets setting was. And if the server where the operation 
originated had conserve-sockets=true, then the remote server used a shared 

Re: [PROPOSAL] Change the default value of conserve-sockets to false

2020-12-03 Thread Anilkumar Gingade
I was conversing with few of the dev's about requirement of different 
settings/configuration for set of nodes in the cluster depending on the 
business/application needs; for example set of nodes serving different kind of 
application requirement (data store) than other nodes in the cluster 
(computation heavy). I am calling this as heterogeneous cluster configuration 
(mostly in large cluster) compared to homogeneous cluster (same config across 
all the nodes). We need to be thinking both kind of deployment as the business 
models are moving towards cloud based services more and more for the entire org.
We need to be thinking about auto setting of configuration values (dynamic) 
based on the load, resource availability and service agreements. We should plan 
taking few of these settings and build a logic where these can be automatically 
adjusted.

Sorry for diverting from the actual email thread subject.

Barry, it’s a great find. Will there be dedicated channel for communication 
from the node where conserve-socket is set to false to the remote nodes.

 -Anil.

On 12/2/20, 3:14 PM, "Barrett Oglesby"  wrote:

I ran a bunch of tests using the long-running-test code where the servers 
had a mix of conserve-sockets settings, and they all worked ok.

One set of tests had 6 servers - 3 with conserve-sockets=false and 3 with 
conserve-sockets=true.

Another set of tests had 4 servers - 3 with conserve-sockets=false and 1 
with conserve-sockets=true.

In each case, the multi-threaded client did:

- puts
- gets
- destroys
- function updates
- oql queries

One thing I found interesting was the server where the operation originated 
dictated which thread was used on the remote server. If the server where the 
operation originated had conserve-sockets=false, then the remote server used an 
unshared P2P message reader to process the replication no matter what its 
conserve-sockets setting was. And if the server where the operation originated 
had conserve-sockets=true, then the remote server used a shared P2P message 
reader to process the replication no matter what its conserve-sockets setting 
was.

Here is some logging from a DistributionMessageObserver that shows that 
behavior.

Case 1:

The server (server1) that processes the put operation from the client is 
primary and has conserve-sockets=false.
The server (server2) that handles the UpdateWithContextMessage has 
conserve-sockets=true.

1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

ServerConnection on port 60802 Thread 4: TestDistributionMessageObserver 
operation=beforeSendMessage; time=1606929894787; 
message=UpdateOperation$UpdateWithContextMessage(region 
path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); 
recipients=[192.168.1.8(server-conserve-sockets1:58995):41002]

2. An unshared P2P message reader in server2 handles the 
UpdateWithContextMessage even though conserve-sockets=true:

P2P message reader for 192.168.1.8(server1:58984):41001 unshared 
ordered uid=11 dom #1 local port=58405 remote port=60860: 
DistributionMessage.schedule 
msg=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; 
sender=192.168.1.8(server1:58984):41001; op=UPDATE; key=0; 
newValue=(10485820 bytes))
P2P message reader for 192.168.1.8(server1:58984):41001 unshared 
ordered uid=11 dom #1 local port=58405 remote port=60860: 
TestDistributionMessageObserver operation=beforeProcessMessage; 
time=1606929894809; message=UpdateOperation$UpdateWithContextMessage(region 
path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984):41001; 
op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
P2P message reader for 192.168.1.8(server1:58984):41001 unshared 
ordered uid=11 dom #1 local port=58405 remote port=60860: 
TestDistributionMessageObserver operation=afterProcessMessage; 
time=1606929894810; message=UpdateOperation$UpdateWithContextMessage(region 
path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984):41001; 
op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

Case 2:

The server (server1) that processes the put operation from the client is 
primary and has conserve-sockets=true.
The server (server2) that handles the UpdateWithContextMessage has 
conserve-sockets=false.

1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

ServerConnection on port 61474 Thread 1: TestDistributionMessageObserver 
operation=beforeSendMessage; time=1606932400283; 
message=UpdateOperation$UpdateWithContextMessage(region 
path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); 
recipients=[192.168.1.8(server1:63224):41001]

2. The shared P2P message reader in server2 handles the 
UpdateWithContextMessage and sends the ReplyMessage even though 
conserve-sockets=false:

P2P message reader for 
192.168.1.8(server-conserve-sockets1:63240):41002 shared 

Re: Geode - store and query JSON documents

2020-11-24 Thread Anilkumar Gingade
Ankit,

Here is how to query col2.
"SELECT d.col2 FROM /JsonRegion v, v.data d, d.col2 c where c.k21 = '22'";

You can find example on how to query nested collections:
https://geode.apache.org/docs/guide/18/getting_started/querying_quick_reference.html

When you want to select a nested collection and inspect its value; you need to 
create iterator in the from clause (E.g.  d.col2 in the above query)

You can find other ways to query arrays in the above sample.

-Anil.



On 11/23/20, 10:02 PM, "ankit Soni"  wrote:

Hi Anil,

Thanks a lot for your reply. This really helps to proceed. The query shared
by you worked but I need a slight variation of it, i.e where clause
contains col2 (data.col2.k21 = '22') which is array unlike col1
(object).

FYI: value is stored in cache.
PDX[28847624, __GEMFIRE_JSON]{
data=[PDX[28847624, __GEMFIRE_JSON] {
col1=PDX[28626794, __GEMFIRE_JSON] {k11=aaa, k12=true, k13=,
k14=2020-12-31T00..}
Col2=[PDX[25385544, __GEMFIRE_JSON]{k21=, k22=true}]}]}
Based on OQL querying doc shared, tried few ways but no luck on querying
based on Col2.

It will be really helpful if you share updated query.

Thanks
Ankit.

On Tue, Nov 24, 2020, 2:42 AM Anilkumar Gingade  wrote:

> Ankit,
>
> Here is how you can query your JSON object.
>
> String queryStr = "SELECT d.col1 FROM /JsonRegion v, v.data d where
> d.col1.k11 = 'aaa'";
>
> As replied earlier; the data is stored as PdxInstance type in the cache.
> In the PdxInstance, the data is stored as top level or nested collection 
of
> objects/values based on input JSON object structure.
> The query engine queries on the PdxInstance type and returns the value.
>
> To see, how the PdxInstance data looks like in the cache, you can print
> the returned value from querying the region values:
> E.g.:
>  String queryStr = "SELECT v FROM /JsonRegion v";
>  SelectResults results = (SelectResults)
> QueryService().newQuery(queryStr).execute();
>   Object[] value = results.asList().toArray();
>   System.out.println(" Projected value: " + value[0]);
>
> You can find sample queries on different type of objects (collections,
> etc) at:
>
> 
https://geode.apache.org/docs/guide/18/getting_started/querying_quick_reference.html
>
> Also in order to determine where the time is getting spent, can you
> separate out object creation through JSONFormatter from put operation.
> E.g.:
> PdxInstance pdxInstance = JSONFormatter.fromJSON(jsonDoc_2);
> // Time taken to format:
> region.put("1", pdxInstance);
> // Time taken to add to cache:
>
> And measure the time separately. It will help to see if the time is spent
> in getting the PdxInstance or in doing puts. Also, can you measure the 
time
> in avg.
> E.g. Say time measured for puts from 1000 to 2000 and avg time for those
> puts.
>
> -Anil.
>
>
> On 11/23/20, 11:27 AM, "ankit Soni"  wrote:
>
>  Hello geode-dev,
>
> I am *evaluating usage of Geode (1.12) with storing JSON documents and
> querying the same*. I am able to store the json records successfully 
in
> geode but seeking guidance on how to query them.
> More details on code and sample json is,
>
>
> *Sample client-code*
>
> import org.apache.geode.cache.client.ClientCache;
> import org.apache.geode.cache.client.ClientCacheFactory;
> import org.apache.geode.cache.client.ClientRegionShortcut;
> import org.apache.geode.pdx.JSONFormatter;
> import org.apache.geode.pdx.PdxInstance;
>
> public class MyTest {
>
> *//NOTE: Below is truncated json, single json document can max
> contain an array of col1...col30 (30 diff attributes) within data. *
> public final static  String jsonDoc_2 = "{" +
> "\"data\":[{" +
> "\"col1\": {" +
> "\"k11\": \"aaa\"," +
> "\"k12\":true," +
> "\"k13\": ," +
> "\"k14\": \"2020-12-31:00:00:00\"" +
> "}," +
> "\"col2\":[{" +
> 

Re: Geode - store and query JSON documents

2020-11-23 Thread Anilkumar Gingade
Ankit,

Here is how you can query your JSON object.

String queryStr = "SELECT d.col1 FROM /JsonRegion v, v.data d where d.col1.k11 
= 'aaa'";

As replied earlier; the data is stored as PdxInstance type in the cache. In the 
PdxInstance, the data is stored as top level or nested collection of 
objects/values based on input JSON object structure. 
The query engine queries on the PdxInstance type and returns the value.

To see, how the PdxInstance data looks like in the cache, you can print the 
returned value from querying the region values:
E.g.:
 String queryStr = "SELECT v FROM /JsonRegion v";
 SelectResults results = (SelectResults) 
QueryService().newQuery(queryStr).execute();
  Object[] value = results.asList().toArray();
  System.out.println(" Projected value: " + value[0]);

You can find sample queries on different type of objects (collections, etc) at:
https://geode.apache.org/docs/guide/18/getting_started/querying_quick_reference.html

Also in order to determine where the time is getting spent, can you separate 
out object creation through JSONFormatter from put operation.
E.g.:
PdxInstance pdxInstance = JSONFormatter.fromJSON(jsonDoc_2);
// Time taken to format:
region.put("1", pdxInstance);
// Time taken to add to cache:

And measure the time separately. It will help to see if the time is spent in 
getting the PdxInstance or in doing puts. Also, can you measure the time in 
avg. 
E.g. Say time measured for puts from 1000 to 2000 and avg time for those puts. 

-Anil.


On 11/23/20, 11:27 AM, "ankit Soni"  wrote:

 Hello geode-dev,

I am *evaluating usage of Geode (1.12) with storing JSON documents and
querying the same*. I am able to store the json records successfully in
geode but seeking guidance on how to query them.
More details on code and sample json is,


*Sample client-code*

import org.apache.geode.cache.client.ClientCache;
import org.apache.geode.cache.client.ClientCacheFactory;
import org.apache.geode.cache.client.ClientRegionShortcut;
import org.apache.geode.pdx.JSONFormatter;
import org.apache.geode.pdx.PdxInstance;

public class MyTest {

*//NOTE: Below is truncated json, single json document can max
contain an array of col1...col30 (30 diff attributes) within data. *
public final static  String jsonDoc_2 = "{" +
"\"data\":[{" +
"\"col1\": {" +
"\"k11\": \"aaa\"," +
"\"k12\":true," +
"\"k13\": ," +
"\"k14\": \"2020-12-31:00:00:00\"" +
"}," +
"\"col2\":[{" +
"\"k21\": \"22\"," +
"\"k22\": true" +
"}]" +
"}]" +
"}";

* //NOTE: Col1col30 are mix of JSONObject ({}) and JSONArray
([]) as shown above in jsonDoc_2;*

public static void main(String[] args){

//create client-cache
ClientCache cache = new
ClientCacheFactory().addPoolLocator(LOCATOR_HOST, PORT).create();
Region region = cache.createClientRegionFactory(ClientRegionShortcut.CACHING_PROXY)
.create(REGION_NAME);

//store json document
region.put("key", JSONFormatter.fromJSON(jsonDoc_2));

//How to query json document like,

// 1. select col2.k21, col1, col20 from /REGION_NAME where
data.col2.k21 = '22' OR data.col2.k21 = '33'

// 2. select col2.k21, col1.k11, col1 from /REGION_NAME where
data.col1.k11 in ('aaa', 'xxx', 'yyy')
}
}

*Server: Region-creation*

gfsh> create region --name=REGION_NAME --type=PARTITION
--redundant-copies=1 --total-num-buckets=61


*Setup: Distributed cluster of 3 nodes
*

*My Observations/Problems*
-  Put operation takes excessive time: region.put("key",
JSONFormatter.fromJSON(jsonDoc_2));  - Fetching a single record from () a
file and Storing in geode approx. takes . 3 secs
   Is there any suggestions/configuration related to JSONFormatter API or
other to optimize this...?

*Looking forward to guidance on querying this JOSN for above sample
queries.*

*Thanks*
*Ankit.*



Re: Geode - store and query JSON documents

2020-11-23 Thread Anilkumar Gingade
Gester, Looking at the sample query, I Believe Ankit is asking about OQL query 
not Lucene...

-Anil.


On 11/23/20, 9:02 AM, "Xiaojian Zhou"  wrote:

Ankit:

Geode provided lucene query on json field. Your query can be supported. 

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgemfire.docs.pivotal.io%2F910%2Fgeode%2Ftools_modules%2Flucene_integration.htmldata=04%7C01%7Cagingade%40vmware.com%7Cd513ee6b680c483830df08d88fd194f5%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637417477593275133%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=l4RfUYfWLRnun%2BOYKtIE0pjkC047LsWBBNMdQb3MY2M%3Dreserved=0

However in above document, it did not provided a query example on JSON 
object. 

I can give you some sample code to query on JSON.

Regards
Xiaojian Zhou

On 11/22/20, 11:53 AM, "ankit Soni"  wrote:

Hello geode-devs, please provide a guidance on this.

Ankit.

On Sat, 21 Nov 2020 at 10:23, ankit Soni  
wrote:

> Hello team,
>
> I am *evaluating usage of Geode (1.12) with storing JSON documents and
> querying the same*. I am able to store the json records successfully 
in
> geode but seeking guidance on how to query them.
> More details on code and sample json is,
>
>
> *Sample client-code*
>
> import org.apache.geode.cache.client.ClientCache;
> import org.apache.geode.cache.client.ClientCacheFactory;
> import org.apache.geode.cache.client.ClientRegionShortcut;
> import org.apache.geode.pdx.JSONFormatter;
> import org.apache.geode.pdx.PdxInstance;
>
> public class MyTest {
>
> *//NOTE: Below is truncated json, single json document can max 
contain an array of col1...col30 (30 diff attributes) within data. *
> public final static  String jsonDoc_2 = "{" +
> "\"data\":[{" +
> "\"col1\": {" +
> "\"k11\": \"aaa\"," +
> "\"k12\":true," +
> "\"k13\": ," +
> "\"k14\": \"2020-12-31:00:00:00\"" +
> "}," +
> "\"col2\":[{" +
> "\"k21\": \"22\"," +
> "\"k22\": true" +
> "}]" +
> "}]" +
> "}";
>
> * //NOTE: Col1col30 are mix of JSONObject ({}) and JSONArray 
([]) as shown above in jsonDoc_2;*
>
> public static void main(String[] args){
>
> //create client-cache
> ClientCache cache = new 
ClientCacheFactory().addPoolLocator(LOCATOR_HOST, PORT).create();
> Region region = cache.createClientRegionFactory(ClientRegionShortcut.CACHING_PROXY)
> .create(REGION_NAME);
>
> //store json document
> region.put("key", JSONFormatter.fromJSON(jsonDoc_2));
>
> //How to query json document like,
>
> // 1. select col2.k21, col1, col20 from /REGION_NAME where 
data.col2.k21 = '22' OR data.col2.k21 = '33'
>
> // 2. select col2.k21, col1.k11, col1 from /REGION_NAME where 
data.col1.k11 in ('aaa', 'xxx', 'yyy')
> }
> }
>
> *Server: Region-creation*
>
> gfsh> create region --name=REGION_NAME --type=PARTITION 
--redundant-copies=1 --total-num-buckets=61
>
>
> *Setup: Distributed cluster of 3 nodes
> *
>
> *My Observations/Problems*
> -  Put operation takes excessive time: region.put("key",
> JSONFormatter.fromJSON(jsonDoc_2));  - Fetching a single record from 
() a
> file and Storing in geode approx. takes . 3 secs
>Is there any suggestions/configuration related to JSONFormatter 
API or
> other to optimize this...?
>
> *Looking forward to guidance on querying this JOSN for above sample
> queries.*
>
> *Thanks*
> *Ankit*
>




Re: Apache Geode 1.13.1 patch proposal

2020-11-12 Thread Anilkumar Gingade
+1

On 11/12/20, 11:34 AM, "Owen Nichols"  wrote:

+1 Sounds good to me, thanks @Dick for stepping up!

Let's also start posting Geode release artifacts to GitHub too (as many 
other projects already do).  I've backfilled the last couple releases, check it 
out here: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Freleasesdata=04%7C01%7Cagingade%40vmware.com%7C73a0747375654576a38008d88741ea94%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637408064466196245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=b73FbcZs9fubwppKVdkDPxb5JrRz98Eu4W9m2OeZoME%3Dreserved=0

On 11/12/20, 11:01 AM, "Dick Cavender"  wrote:

It's been two months since the 1.13.0 release and there have been 28 
important fixes on support/1.13 that the community would benefit from. Based on 
this I'd like to propose release of Apache Geode 1.13.1 based on the current 
support/1.13 branch. I'll volunteer to be the release manager for 1.13.1 so 
look forward to an RC1 soon.

-Dick





Re: [PROPOSAL] Backport GEODE-8608 to support 1.13, 1.12 branch

2020-10-14 Thread Anilkumar Gingade
+1 
After the PR pipeline is completed.

-Anil.

On 10/14/20, 1:32 PM, "Xiaojian Zhou"  wrote:

Hi,

There’s a race that StateFlush could hang when the target member is 
shutdown. GEODE-8608 fixed. This fix is a patch to GEODE-8385.

The fix should be backported to all previous versions with GEODE-8385.

We are still waiting for prechecking to finish.

Regards
Xiaojian Zhou



Re: [PROPOSAL] backport fix for GEODE-8574 to 1.13.1

2020-10-08 Thread Anilkumar Gingade
+1

On 10/8/20, 7:51 AM, "Jinmei Liao"  wrote:

I would like to include the fix for GEODE-8574 to 1.13.1, it would greatly 
help the Geode on k8s experience.

Thanks!

Jinmei



Re: [Discussion] RFC to make Geode's working directory configurable

2020-10-07 Thread Anilkumar Gingade
Dale, I have few questions that I have added as comments to the RFC.

On 10/6/20, 5:24 PM, "Jacob Barrett"  wrote:

Do we expect this to be used by production code or just test code? If this 
is going to be used by production code I am concerned with introducing another 
singleton class into the mix. We really want to be moving towards a 
non-singleton world where I can have more than one Cache in a JVM. For 
production code this value should probably be retrieved from the Cache, 
DistributedSystem or some child of those instances. If this is for test code 
only then ignore me the above concerns.

> On Oct 6, 2020, at 12:12 PM, Dale Emery  wrote:
> 
> Hi all,
> 
> I have submitted an RFC to make Geode’s working directory configurable: 
https://cwiki.apache.org/confluence/display/GEODE/Make+Geode%27s+Working+Directory+Configurable
> 
> Please review it and comment by Oct 26.
> 
> Cheers,
> Dale
> 




Re: Colocated regions missing some buckets after restart

2020-09-16 Thread Anilkumar Gingade
Mario,

Take a thread dump; couple of times at an interval of a minute...See if you can 
find threads stuck in region creation...This will show if there are any lock 
contention.

-Anil.


On 9/16/20, 6:29 AM, "Mario Kevo"  wrote:

Hi Anil,

From server logs we see that have some threads stucked and continuosly get 
on server2 the following message(bucket missing on server2 for DfSessions 
region):
[warn 2020/09/15 14:25:39.852 CEST  
tid=0x251] 15 secs have elapsed waiting for a primary for bucket [BucketAdvisor 
/__PR/_B__DfSessions_18:935: state=VOLUNTEERING_HOSTING]. Current bucket owners 
[]


And on the other server1:
[warn 2020/09/15 14:25:40.852 CEST  
tid=0xdf] 15 seconds have elapsed while waiting for replies: 
:41003]> on 
192.168.0.145(server1:28031):41002 whose current membership list is: 
[[192.168.0.145(locator1:27244:locator):41000, 
192.168.0.145(locator2:27343:locator):41001, 
192.168.0.145(server1:28031):41002, 192.168.0.145(server2:28054):41003]]

[warn 2020/09/15 14:27:20.200 CEST  tid=0x11] Thread 223 
(0xdf) is stuck

[warn 2020/09/15 14:27:20.202 CEST  tid=0x11] Thread <223> 
(0xdf) that was executed at <15 Sep 2020 14:25:24 CEST> has been stuck for 
<115.361 seconds> and number of thread monitor iteration <1>
Thread Name  state 
...
It seems that this is not problem with stats.
We have a some suspicion that the problem is with some lock, but we need to 
investigate it a bit more.

BR,
Mario



________
Šalje: Anilkumar Gingade 
Poslano: 15. rujna 2020. 16:36
Prima: dev@geode.apache.org 
Predmet: Re: Colocated regions missing some buckets after restart

Mario,

I doubt this has anything to do with the client connections. If it is it 
should be between server/member to server/member connection; in that case the 
unresponsive member is kicked out from the cluster.

The recommended configuration is to have persistence regions for both 
parent and co-located regions (and replicated regions)...

There could be issues in the stats too...Can you try executing a 
test/validation code on server side to dump/list primary and secondary buckets.
You can do that using helper methods: 
pr.getDataStore().getAllLocalPrimaryBucketIds();

-Anil

On 9/14/20, 12:25 AM, "Mario Kevo"  wrote:

Hi,


This problem is usually seen only on 1 server. The other servers 
metrics and bucket count looks fine. Another symptom of this issue is that the 
max-connections limit is reached on the problematic server if we have a client 
that tries to reconnect after the server restart. Clients simply get no 
response from the server so they try to close the connection, but the 
connection close is not acknowledged by the server. On server side we see that 
the connections are in CLOSE-WAIT state with packets in the socket receiver 
queue. It’s as if the servers just stopped processing packets on the sockets 
while waiting for a member with the primary bucket.



So in short, each new client connection is “unresponsive”. The client 
tries to close it a open a new one, but the socket doesn’t get closed on server 
side and the connection is left “hanging” on the server. Clients will try to do 
this until max-connections is reached on the servers. This is why we would be 
unable to add any data to the regions. But IMHO it’s really not dependent on 
adding data, since this issue happens occasionally (1 out of ~4 restarts) and 
only on one server.



The initial problem was observed with a persistent region A (with 1 
key-value pairs inserted) and a non-persistent region B collocated with region 
A. We did some tests with both regions being persistent. We haven’t observed 
the same issue yet (although we did only a few restarts), but we observed 
something that also looks quite worrying. Both servers start up without 
reporting issues in the logs. But, looking at the server metrics, one server 
has wrong information about “bucketCount” and is missing primary buckets. E.g:


First server:

Partition   | putLocalRate | 0.0

| putRemoteRate| 0.0

| putRemoteLatency | 0

| putRemoteAvgLatency  | 0

| bucketCount  | 113

| primaryBucketCount   | 57



Second server:

Partition   | putLocalRate | 0.0

| putRemoteRate| 0.0

| putRemoteLatency | 0

| putRemoteAvgLatency  | 0

| bucketCount  | 111

| primaryBucketCount   | 55


So we are missing a primary bucket without being aware of the issue.

BR,
Mario

________
Šalje: Anilkumar Gingade 
Poslano: 11. rujna 2020. 20:34
 

Re: Colocated regions missing some buckets after restart

2020-09-15 Thread Anilkumar Gingade
Mario,

I doubt this has anything to do with the client connections. If it is it should 
be between server/member to server/member connection; in that case the 
unresponsive member is kicked out from the cluster.

The recommended configuration is to have persistence regions for both parent 
and co-located regions (and replicated regions)...

There could be issues in the stats too...Can you try executing a 
test/validation code on server side to dump/list primary and secondary buckets.
You can do that using helper methods: 
pr.getDataStore().getAllLocalPrimaryBucketIds();

-Anil

On 9/14/20, 12:25 AM, "Mario Kevo"  wrote:

Hi,


This problem is usually seen only on 1 server. The other servers metrics 
and bucket count looks fine. Another symptom of this issue is that the 
max-connections limit is reached on the problematic server if we have a client 
that tries to reconnect after the server restart. Clients simply get no 
response from the server so they try to close the connection, but the 
connection close is not acknowledged by the server. On server side we see that 
the connections are in CLOSE-WAIT state with packets in the socket receiver 
queue. It’s as if the servers just stopped processing packets on the sockets 
while waiting for a member with the primary bucket.



So in short, each new client connection is “unresponsive”. The client tries 
to close it a open a new one, but the socket doesn’t get closed on server side 
and the connection is left “hanging” on the server. Clients will try to do this 
until max-connections is reached on the servers. This is why we would be unable 
to add any data to the regions. But IMHO it’s really not dependent on adding 
data, since this issue happens occasionally (1 out of ~4 restarts) and only on 
one server.



The initial problem was observed with a persistent region A (with 1 
key-value pairs inserted) and a non-persistent region B collocated with region 
A. We did some tests with both regions being persistent. We haven’t observed 
the same issue yet (although we did only a few restarts), but we observed 
something that also looks quite worrying. Both servers start up without 
reporting issues in the logs. But, looking at the server metrics, one server 
has wrong information about “bucketCount” and is missing primary buckets. E.g:


First server:

Partition   | putLocalRate | 0.0

| putRemoteRate| 0.0

| putRemoteLatency | 0

| putRemoteAvgLatency  | 0

| bucketCount  | 113

| primaryBucketCount   | 57



Second server:

Partition   | putLocalRate | 0.0

| putRemoteRate| 0.0

| putRemoteLatency | 0

| putRemoteAvgLatency  | 0

| bucketCount  | 111

| primaryBucketCount   | 55


So we are missing a primary bucket without being aware of the issue.

BR,
Mario


    Šalje: Anilkumar Gingade 
Poslano: 11. rujna 2020. 20:34
Prima: dev@geode.apache.org 
Predmet: Re: Colocated regions missing some buckets after restart

Are you seeing no-buckets for persistent regions or non-persistent. The 
buckets are created dynamically; when data is added to corresponding buckets...
When server is restarted, in case of in-memory regions as the data is not 
there, the bucket region may not have been created (my suspicion).
Can you try adding data and see if the co-located bucket region gets 
created in respective nodes/server.

-Anil.


On 9/11/20, 9:46 AM, "Mario Kevo"  wrote:

Hi geode-dev,

We have a system with two servers and a few regions. One region is 
persistent and other are not but they are colocated with this persistent region.
After servers restart on some region we can see that they don't have 
any bucket.
gfsh>show metrics --member=server-1 --region=/region1 
--categories=partition
Metrics for region:/region1 On Member server-1


Category  |Metric| Value
- |  | -
partition | putLocalRate | 0.0
  | putRemoteRate| 0.0
  | putRemoteLatency | 0
  | putRemoteAvgLatency  | 0
  | bucketCount  | 0
  | primaryBucketCount   | 0
  | configuredRedundancy | 1
  | actualRedundancy | 0
  | numBucketsWithoutRedundancy  | 113
  | totalBucketSize  | 0

gfsh>show metrics --member=server-0 --region=/region1 
--categories=partition
Metrics for region:/region1 On Member server-0

Category  |   

Re: Colocated regions missing some buckets after restart

2020-09-11 Thread Anilkumar Gingade
Are you seeing no-buckets for persistent regions or non-persistent. The buckets 
are created dynamically; when data is added to corresponding buckets...
When server is restarted, in case of in-memory regions as the data is not 
there, the bucket region may not have been created (my suspicion). 
Can you try adding data and see if the co-located bucket region gets created in 
respective nodes/server.

-Anil.


On 9/11/20, 9:46 AM, "Mario Kevo"  wrote:

Hi geode-dev,

We have a system with two servers and a few regions. One region is 
persistent and other are not but they are colocated with this persistent region.
After servers restart on some region we can see that they don't have any 
bucket.
gfsh>show metrics --member=server-1 --region=/region1 --categories=partition
Metrics for region:/region1 On Member server-1


Category  |Metric| Value
- |  | -
partition | putLocalRate | 0.0
  | putRemoteRate| 0.0
  | putRemoteLatency | 0
  | putRemoteAvgLatency  | 0
  | bucketCount  | 0
  | primaryBucketCount   | 0
  | configuredRedundancy | 1
  | actualRedundancy | 0
  | numBucketsWithoutRedundancy  | 113
  | totalBucketSize  | 0

gfsh>show metrics --member=server-0 --region=/region1 --categories=partition
Metrics for region:/region1 On Member server-0

Category  |Metric| Value
- |  | -
partition | putLocalRate | 0.0
  | putRemoteRate| 0.0
  | putRemoteLatency | 0
  | putRemoteAvgLatency  | 0
  | bucketCount  | 113
  | primaryBucketCount   | 56
  | configuredRedundancy | 1
  | actualRedundancy | 0
  | numBucketsWithoutRedundancy  | 113
  | totalBucketSize  | 0


The persistent region is ok, but some of these colocated regions has this 
issue. We also wait some time, but it doesn't change.

Does anyone have some idea about this problem, what causing the issue?
The issue can be easily reproduced with two locators, two servers, one 
persistent region and few non-persistent regions colocated with persistent one.
After restart both servers and try to do show metrics command you will got 
this issue for some regions.

BR,
Mario




Re: Question on how Geode handles data on Disk

2020-09-08 Thread Anilkumar Gingade
Amit,

You can find high level details at:
https://geode.apache.org/docs/guide/112/developing/storing_data_on_disk/chapter_overview.html

Geode keeps the Key always in memory. Geode creates different Region-Entries 
(Key-Value pair) based on the region configuration and how/where data is 
stored. 
When application tries to retrieve data for a given key, it knows where to look 
for it in the disk and loads it to the memory (only that value). 
There are certain operation like querying, where getting the data into memory 
is avoided, as it does region/index scan.

-Anil.


On 9/7/20, 11:57 PM, "Amit Pandey"  wrote:

I agree its kind of academic but any answers will be appreciated, I just
wanted to understand the performance characteristics when using disk
persistence

On Sun, Sep 6, 2020 at 11:54 PM Amit Pandey 
wrote:

> What I meant here "Also if I request data for ID 1 will it bring it only
> from disk or " was that if I request a tuple which is not in memory and is
> in disk,
>   1) How does Geode know its in DISK
>   2) Does it bring only that tuple (value) to Memory or brings the whole
> page
>
> Regards
>
>
> On Sun, Sep 6, 2020 at 11:36 PM Amit Pandey 
> wrote:
>
>> Hi Geode Devs,
>>
>> How does Geode handle data on disk ? How does Geode mark that some data
>> of a region is in disk , is it basically what we call as the Tombstone
>> approach ? Also if I request data for ID 1 will it bring it only from 
disk
>> or
>>
>> Secondly, how does Geode handle updates of data on disk ?  SO does it
>> discard the data on disk when it brings a tuple in memory and writes a 
new
>> file to disk ?
>>
>> Regards
>>
>>



Re: [PROPOSAL] Backport GEODE-8475 to 1.13

2020-09-02 Thread Anilkumar Gingade
+1 As it address a potential hang. 

-Anil.


On 9/2/20, 10:38 AM, "Xiaojian Zhou"  wrote:

Hi, All:

I want to backport my fix in GEODE-8475 to 1.13. It fixed a hang caused by 
a potential deadlock.

This fix is quite safe, I have verified it by running all queue related 
regression.

Regards
Gester



[PROPOSAL] backport GEODE-8394 to support/1.13

2020-08-07 Thread Anilkumar Gingade
This causes a large object to be partially (corrupt data) stored in cache 
instead of throwing an exception.



Re: [PROPOSAL] Cherry pic GEODE-8331 to support branches

2020-07-22 Thread Anilkumar Gingade
+1 
This will provide a consistent experience our end user from 1.10 release 
version.

On 7/22/20, 2:23 PM, "Jinmei Liao"  wrote:

I would like to propose to cherry pick GEODE-8331: allow GFSH to connect to 
other versions of cluster (#5375) to support branches up to 1.10. This would 
allow gfsh to connect to other versions of cluster and provide better error 
messages when command is not support by the connected cluster.

Jinmei



Re: negative ActiveCQCount

2020-07-01 Thread Anilkumar Gingade
Seems like a bug to me. Can you please create a jira ticket.

The active CQ counts will be more meaningful at member level; they could be 
different on different servers based on the CQs registered and the redundancy 
level set. And that helps to determine the load on each server.

-Anil. 

On 7/1/20, 5:52 AM, "Mario Kevo"  wrote:

Hi Kirk, thanks for the response!

I just realized that I wrongly describe the problem as I tried so many 
case. Sorry!

We have system with two servers. If the redundancy is 0 then we have 
properly that on the first server is activeCqCount=1 and on the second is 
activeCqCount=0.
After close CQ we got on first server activeCqCount=0 and on the second is 
activeCqCount=-1.
gfsh>show metrics --categories=query
Cluster-wide Metrics

Category |  Metric  | Value
 |  | -
query| activeCQCount| -1
 | queryRequestRate | 0.0


In case we set redundancy to 1 it increments properly as expected, on both 
servers by one. But when cq is closed we got on both servers activeCqCount=-1. 
And show metrics command has the following output
gfsh>show metrics --categories=query
Cluster-wide Metrics

Category |  Metric  | Value
 |  | -
query| activeCQCount| -1
 | queryRequestRate | 0.0

What I found is that when server register cq on one server it send message 
to other servers in the system with opType=REGISTER_CQ and in that case it 
creates new instance of ServerCqImpl on second server(with empty constructor of 
ServerCqImpl). When we close CQ there is two different instances on servers and 
it closed both of them, but as they are in RUNNING state before closing, it 
decrements activeCqCount on both of them.

BR,
Mario


Šalje: Kirk Lund 
Poslano: 30. lipnja 2020. 19:54
Prima: dev@geode.apache.org 
Predmet: Re: negative ActiveCQCount

I think *show metrics --categories=query* is showing you the query stats
from DistributedSystemMXBean (see
ShowMetricsCommand#writeSystemWideMetricValues). DistributedSystemMXBean
aggregates values across all members in the cluster, so I would have
expected activeCQCount to initially show a value of 2 after you create a
ServerCQImpl in 2 servers. Then after closing the CQ, it should drop to a
value of 0.

When you create a CQ on a Server, it should be reflected asynchronously on
the CacheServerMXBean in that Server. Each Server has its own
CacheServerMXBean. Over on the Locator (JMX Manager), the
DistributedSystemMXBean aggregates the count of active CQs in
ServerClusterStatsMonitor by invoking
DistributedSystemBridge#updateCacheServer when the CacheServerMXBean state
is federated to the Locator (JMX Manager).

Based on what I see in code and in the description on GEODE-8293, I think
you might want to see if increment has a problem instead of decrement.

I don't see anything that would limit the activeCQCount to only count the
CQs on primaries. So, I would expect redundancy=1 to result in a value of
2. Does anyone else have different info about this?

On Tue, Jun 30, 2020 at 5:31 AM Mario Kevo  wrote:

> Hi geode-dev,
>
> I have a question about CQ(
> https://issues.apache.org/jira/browse/GEODE-8293).
> If we run CQ it register cq on one of the
> servers(setPoolSubscriptionRedundancy is 1) and increment activeCQCount.
> As I understand then it processInputBuffer to another server and there is
> deserialization of the message. In case if opType is REGISTER_CQ or
> SET_CQ_STATE it will call readCq from CqServiceProvider, at the end calls
> empty contructor ServerCQImpl which is used for deserialization.
>
> The problem is when we close CQ then it has ServerCqImpl reference on both
> servers, close them, and decrement on both of them. In that case we have
> negative value of activeCQCount in show metrics command.
>
> Does anyone knows how to get in close method which is the primary and only
> decrement on it?
> Any advice is welcome!
>
> BR,
> Mario
>



Re: Us vs Docker vs Gradle vs JUnit

2020-06-30 Thread Anilkumar Gingade
It feels like, first, we should choose right resources/tools that is suited for 
the task in hand and helps in achieving the expected result (Testing - easier 
to develop, run, monitor and report); and then invest in that once. Even if it 
means to add new tools/subroutines in the product.

E.g.:
Best suited for above requirement:
Runtime environment - Containers (?)
Testing framework - Junit
Build tools - gradle
Reporting/logging - (?)
Managing/Monitoring - (?) 

-Anil.


On 6/30/20, 1:21 PM, "Donal Evans"  wrote:

+1 for fixing the tests. It'll be a lot of work, but it'll only be a lot of 
work once, as opposed to taking on maintenance of our own custom Docker plugin, 
which will be an ongoing effort and not at all immune from getting broken again 
at some point in the future.

From: Jinmei Liao 
Sent: Tuesday, June 30, 2020 12:28 PM
To: dev@geode.apache.org 
Subject: Re: Us vs Docker vs Gradle vs JUnit

I would vote for fixing the tests to use gradle's normal forking. If we are 
going to invest time and effort, let's invest in an option that can reduce our 
dependencies

From: Jacob Barrett 
Sent: Tuesday, June 30, 2020 11:30 AM
To: dev@geode.apache.org 
Subject: Us vs Docker vs Gradle vs JUnit

All,

We are in a bit of a pickle. As you recall from a few years back in an 
effort to both stabilize and parallelize integration, distributed and other 
integration/system like test we use Docker. Many of the tests reused the same 
ports for services which cause them to fail or interact with each other when 
run in parallel. By using Docker to isolate a test we put a bandage on that 
issue. The plugin overrides Gradle’s default forked runner by starting the 
runners in Docker containers and marshaling the execution parameters to those 
Dockerized runners.

The Docker test plugin is effectively unmaintained. The author seems 
content on keeping it compatible with Gradle 4. We forked it to work with 
Gradle 5 and various other issues we have hit over the years. We have shared 
patches in the past with little luck in having them merged and still its only 
compatible with Gradle 4.8 at best. I spent some time trying to port it to 
Gradle 6 but its going to be a larger undertaking given that Gradle 6 is fully 
Java modules compatible. They added new members throughout to handle modules in 
addition to class paths.

Long story short because our tests can’t be parallelized without a 
container system we are stuck. We can’t go to JUnit 5 without updating Docker 
plugin (potentially minor changes). We can’t go to Gradle 6 without updating 
the Docker plugin (potentially huge changes). Being stuck is not a good place. 
I see two paths out of this:

1) We buckle down and fix the tests so they can run in parallel via the 
normal forking mechanism of Gradle. I know some effort has been expended in 
this by using our new rules for starting servers. We should need to go further.

2) Fully invest in the Docker plugin. We would need to fork this off as a 
fully maintain sub-project of Geode. We would need to add to it support for 
both Gradle 6 and JUnit 5.

My money is on fixing the tests. It is clear, at least from my exhaustive 
searching, nobody in the Gradle and JUnit communities are isolating their tests 
with containers. They are creating containers to host service for system level 
testing, see Testcontainers project. The tests themselves run in the local 
kernel space (not in container).

We made this push in the C++ and .NET tests, a much smaller set of tests, 
and it works great. The framework takes care to create clusters that do not 
interact with each other on the same host. Some things in Geode make this 
harder than others, like http service not support ephemeral port selection, and 
gfsh not providing machine readable output about ephemeral port selections. We 
use port knocking to prevent the OS from assigning the port ephemerally to 
another process. The framework knocks, opens and then closes, all the ports it 
needs for the server/locator services and starts them explicitly on those 
ports. Because of port recycling rules in the OS another ephemeral port request 
won’t get those ports for some time after they are closed. It's not perfect but 
it works. Fixing Geode to support ephemeral port selection and a better 
reporting mechanisms for those port choices would be more ideal. Also, we only 
start services necessary for the test, like don’t start the http ports if they 
aren’t going to be used.

I would love some feedback and thoughts on this issue. Does anyone else see 
a different path forward?

-Jake








Re: [Proposal] Add REST command for Restore Redundancy to 1.13 (GEODE-8095)

2020-06-26 Thread Anilkumar Gingade
+1 As Donal said, complete the feature with all the available APIs.

On 6/26/20, 11:50 AM, "Donal Evans"  wrote:

+1

Although normally features wouldn't really count as "critical fixes" that 
would warrant inclusion after the release branch has been cut, in this case, 
the internal API and gfsh commands for restore redundancy are already in the 
release, and it makes much more sense to include the entire feature in one 
release rather than having a semi-complete feature in 1.13 and forcing the REST 
component to wait for a later release.

From: Mark Hanson 
Sent: Friday, June 26, 2020 10:06 AM
To: dev@geode.apache.org 
Subject: [Proposal] Add REST command for Restore Redundancy to 1.13 
(GEODE-8095)

Hello All,

The core of the restore redundancy call structure has been refactored to 
allow there to be a REST call to invoke a restore redundancy. At this point, 
looking forward to the 1.13 release it would be great if we could fit this into 
the 1.13 release.

What do people think?

Thanks,
Mark



Re: [PROPOSAL] Add windows jobs to PR checks

2020-06-25 Thread Anilkumar Gingade
Looking at the cost and value derived; My vote is with current/existing process 
(not running for every PR).

On 6/25/20, 11:39 AM, "Mark Hanson"  wrote:

I support adding it in, but I think the time wasted is less than you think. 
I think for me the most important thing is finding an issue when it is put in.

I think the current way is actually faster and more efficient, because 
every PR doesn’t have to wait the 4 hours and in reality the number is of 
windows failures is lower than the number of linux failures.

Just a thought.

Thanks,
Mark


> On Jun 25, 2020, at 11:30 AM, Jianxia Chen  wrote:
> 
> +1 to add Windows tests to the PR pipeline. It may take longer time to run
> (up to 4 hours). But consider the time wasted on reverting, fixing and
> resubmitting, if there is a failure after merging to the develop branch. 
It
> is better to add the Windows tests to the PR pipeline. We can reevaluate
> and optimize the pipeline if the long running time is truly a concern.
> 
> On Thu, Jun 25, 2020 at 9:29 AM Kirk Lund  wrote:
> 
>> I merged some new AcceptanceTests to develop after having my PR go GREEN.
>> But now these tests are failing in Windows.
>> 
>> I'd like to propose that we add the Windows jobs to our PR checks if we
>> plan to keep testing on Windows in CI.
>> 
>> Please vote or discuss.
>> 
>> Thanks,
>> Kirk
>> 




Re: [PROPOSAL] make Cluster Management Service CRUD operations thread safe

2020-05-28 Thread Anilkumar Gingade
Yes, the DLock machinery handles (has option) dlock grantor departure...

As I understand, right now we have dlock at config persistence layer, but this 
does not guarantee preserving the order in which the config changes are 
applied. E.g.: A create region command followed by destroy could be persisted 
in reverse order.

And the proposal here is to move the lock from persist layer to higher level to 
preserve the ordering. In this case the cost of taking Dlock remains same, 
except the lock window is increased to much higher.
Question is; Is the window so high that it impacts the user experience; 
considering config changes are not often or less concurrent (multiple clients 
changing/updating the config). 
By having one single locking scheme, the logic remains simple, stable and 
manageable.
 
>> " Another way is to use a dlock per ID to only synchronize CRUD operation on 
>> the same ID element"
Is this possible for all config commands...What is "ID" refers to, is it region 
name (region level ops); if so then, does this need parsing the command request 
or is it already available? E.g. create index 
 
>> not sure what's the cost of creating a dlock
The cost depends on who is the dlock grantor? If create request is on the 
grantor itself, its cheaper...If it’s a peer node, than the cost is sending the 
lock request message to the grantor. The cost is with sending message to the 
Grantor, in most cases. Which is not bad, considering the configuration does 
not change frequently. 

-Anil.

On 5/28/20, 11:08 AM, "Jinmei Liao"  wrote:

Simultaneous updates to configurations are already protected by a different 
dlock, so I assume they can be made safely.

Typically a CMS operation involves two parts:

1) updates to the servers to "realize" the configuration
2) updates to the configurations to "persist" it.

The purpose of the CMS level dlock is to make these two parts atomic, if a 
create/delete operations of the same element happened not in an atomic fashion, 
we would end up with inconsistent state between what's persisted and what's 
realized.

I believe the dlock can be configured to expire after a period of time.

From: Anthony Baker 
Sent: Thursday, May 28, 2020 10:40 AM
To: dev@geode.apache.org 
Subject: Re: [PROPOSAL] make Cluster Management Service CRUD operations 
thread safe

I think the first question to answer is:  can simultaneous updates to 
configuration be made safely?  Or what is the critical section of code that 
needs to be protected?

Another thing to consider with dlocks is what happens in the failure case 
when the lock is not properly released.  Does it continue to block all 
management /configuration operations?  Does it expire after a period of time?

Anthony


> On May 28, 2020, at 10:17 AM, Jinmei Liao  wrote:
>
> The proposal is proposing using ONE single dlock to synchronize all CMS 
CRUD operations, that means, in a given time, only one CRUD operation in CMS is 
allowed in the entire cluster, this seems a bit too harsh.
>
> Another way is to use a dlock per ID to only synchronize CRUD operation 
on the same ID element. That way, we allow more concurrency. I am just not sure 
what's the cost of creating a dlock. Is the the cost of creating a dlock per ID 
warrants the performance gain?
>
> Comment/Suggestions?
>
> Jinmei
> 
> From: Jinmei Liao 
> Sent: Tuesday, May 26, 2020 1:02 PM
> To: dev@geode.apache.org 
> Subject: [PROPOSAL] make Cluster Management Service CRUD operations 
thread safe
>
> Hi, Geode Community,
>
> Currently, the CMS CRUD operations are not thread safe, if one call tries 
to create a region, and another call tries to delete the same region, if timing 
is off, we could end up with inconsistent state (what's in cluster config and 
what's actually on the server). So we should make these operations thread safe. 
Here is the proposal to try to achieve it:
>
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FMake%2BCluster%2BManagement%2BService%2528CMS%2529%2BThread%2BSafedata=02%7C01%7Cagingade%40vmware.com%7Cebd6b928654b4946771e08d803322884%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637262861250474025sdata=24psRmhotSCG%2BzjULBohdcyyJ3y%2FsWKii99ipdPFGRE%3Dreserved=0
>
> Comments/suggestions welcome.
>
> Jinmei




Re: [PROPOSAL] Move definition of Region separator character to geode-common

2020-05-18 Thread Anilkumar Gingade
The Region separator should not be user visible. In the past, we had tried
to remove needing this from the end-user or any other place. If we look
into its usage, it is mostly for sub-regions and we don't recommend much
use of this.
I was also wondering, its use by external or management modules have to
know about it; they just have to pass in what the user has provided or
typed in.

-Anil.



On Mon, May 18, 2020 at 10:16 AM Udo Kohlmeyer  wrote:

> I was wondering. Why do we require to add this Region.SEPERATOR to be
> anywhere outside of Region.
>
> Geode-management was purposefully designed NOT to have a dependency on
> core. Creating a new dependency on a donor module, just means that
> management module will now start knowing about geode.
>
> I suggest if you want to make sure that management also uses a common
> Region.SEPARATOR, then maybe create a class inside of management for now OR
> we have to look at management to better understand WHY it requires this
> knowledge and if there could not be a different implementation to avoid
> creating a new donor project.
>
> The whole idea behind modularity is that modules don't expose their
> internals. The Region Separator is REGION specific. That knowledge should
> be kept there. It should not be proliferated around or moved into a
> "common" module, just because there is a leak.
>
> @Donal you are closest to the code, but would it maybe not make more sense
> to just define that constant and maybe raise a JIRA so that we can address
> this "leakage" at a later stage?
>
> --Udo
> 
> From: Jacob Barrett 
> Sent: Saturday, May 16, 2020 11:02 PM
> To: dev@geode.apache.org 
> Subject: Re: [PROPOSAL] Move definition of Region separator character to
> geode-common
>
> Probably. Unfortunately we haven’t been very good and cleaning these up
> and moving forward with a Java modules plan. It’s gunna bite us.
>
> > On May 16, 2020, at 8:08 PM, Donal Evans  wrote:
> >
> > In that case, would it also make sense to move the existing
> GeodeGlossary
> > class to org.apache.geode.common.internal, from its current location in
> > org.apache.geode.util.internal?
> >
> >> On Sat, May 16, 2020 at 8:02 PM Jacob Barrett 
> wrote:
> >>
> >> I am fine as long as you make sure you use a package name that is going
> to
> >> be Java 9 modules safe. Two modules cannot export the same package. So
> if
> >> geode-commons is going to export org.apache.geode.util I think we will
> have
> >> collisions. I suggest org.apache.geode.common.
> >>
> >> -Jake
> >>
> >>
>  On May 16, 2020, at 1:23 PM, Donal Evans  wrote:
> >>>
> >>> I've recently been working on a little side project to replace every
> use
> >> of
> >>> a hardcoded "/" character in region names/paths with a reference to the
> >>> Region.SEPARATOR constant. I ran into some problems though, since the
> >>> geode-management module needs to know about the separator character (in
> >> the
> >>> Region and Index classes) but does not have a dependency on geode-core,
> >>> where the character is currently defined.
> >>>
> >>> Since the whole point of the exercise is to attempt to provide a single
> >>> place where the region separator character is defined, pulling the
> >>> definition down into a module upon which both geode-core and
> >>> geode-management depend seems like the sensible choice, so I'm
> proposing
> >> to
> >>> create a GeodePublicGlossary class (name entirely up for change) in the
> >>> geode-common/src/main/java/org/apache/geode/util/ package, moving the
> >>> definition there, then deprecating the definitions in the Region
> >> interface
> >>> in geode-core.
> >>>
> >>> To preempt a possible question, there already exists a GeodeGlossary
> >> class
> >>> (which defines the GEMFIRE_PREFIX constant), but it's in an internal
> >>> package, so isn't a suitable place to move the definition of the
> >> currently
> >>> user-visible region separator character.
> >>>
> >>> Any feedback or suggestions on this idea would be very welcome.
> >>
> >>
>


Re: [PROPOSAL] bring GEODE-8091 to support branches

2020-05-11 Thread Anilkumar Gingade
+1

On Mon, May 11, 2020 at 4:10 PM Jinmei Liao  wrote:

> https://issues.apache.org/jira/browse/GEODE-8091
>
> We've had users that were trying to use the
> "--load-cluster-configuration-from-dir=true" when starting up a locator
> with a security manager and came across this failure on Geode1.12 and would
> this to be fixed. Can I get a few +1s to port this back to the support
> branches?
>
>
> --
> Cheers
>
> Jinmei
>


Re: [PROPOSAL] include GEODE-8055 in support/1.13

2020-05-04 Thread Anilkumar Gingade
Since this issue is introduced from 1.7; meaning its present from 1.7 time,
can it be added in the next release, is there any strong user/customer
requirement to get this in 1.13.

-Anil.


On Mon, May 4, 2020 at 11:55 AM Jinmei Liao  wrote:

> I would like to include the fix for GEODE-8055 in the 1.13 branch. This
> would allow users to use gfsh to create an index on sub regions.
>
> --
> Cheers
>
> Jinmei
>


Re: About Geode rolling downgrade

2020-04-22 Thread Anilkumar Gingade
That's right, most/always no down-time requirement is managed by having
replicated cluster setups (Disaster-recovery/backup site). The data is
either pushed to both systems through the data ingesters or by using WAN
setup.
The clusters are upgraded one at a time. If there is a failure during
upgrade or needs to be rolled back; one system will be always up
and running.

-Anil.





On Wed, Apr 22, 2020 at 1:51 PM Anthony Baker  wrote:

> Anil, let me see if I understand your perspective by stating it this way:
>
> If cases where 100% uptime is a requirement, users are almost always
> running a disaster recovery site.  It could be active/active or
> active/standby but there are already at least 2 clusters with current
> copies of the data.  If an upgrade goes badly, the clusters can be
> downgraded one at a time without loss of availability.  This is because we
> ensure compatibility across the wan protocol.
>
> Is that correct?
>
>
> Anthony
>
>
>
> > On Apr 22, 2020, at 10:43 AM, Anilkumar Gingade 
> wrote:
> >
> >>> Rolling downgrade is a pretty important requirement for our customers
> >>> I'd love to hear what others think about whether this feature is worth
> > the overhead of making sure downgrades can always work.
> >
> > I/We haven't seen users/customers requesting rolling downgrade as a
> > critical requirement for them; most of the time they had both an old and
> > new setup to upgrade or switch back to an older setup.
> > Considering the amount of work involved, and code complexity it brings
> in;
> > while there are ways to downgrade, it is hard to justify supporting this
> > feature.
> >
> > -Anil.
>
>


Re: About Geode rolling downgrade

2020-04-22 Thread Anilkumar Gingade
>> Rolling downgrade is a pretty important requirement for our customers
>> I'd love to hear what others think about whether this feature is worth
the overhead of making sure downgrades can always work.

I/We haven't seen users/customers requesting rolling downgrade as a
critical requirement for them; most of the time they had both an old and
new setup to upgrade or switch back to an older setup.
Considering the amount of work involved, and code complexity it brings in;
while there are ways to downgrade, it is hard to justify supporting this
feature.

-Anil.





On Tue, Apr 21, 2020 at 2:01 PM Dan Smith  wrote:

> > Anyhow, we wonder what would be as of today the recommended or official
> way to downgrade a Geode system without downtime and data loss?
>
> I think the without downtime option is difficult right now. The most bullet
> proof way to downgrade without data loss is probably just to export/import
> the data, but that involves downtime. In many cases, you could restart the
> system with an old version if you have persistent data because the on disk
> format doesn't change that often, but that won't work in all cases. Or if
> you have multiple redundant WAN sites you could potentially shift traffic
> from one to the other and recreate a WAN site, but that also requires some
> work.
>
> > Rolling downgrade is a pretty important requirement for our customers so
> we would not like to close the discussion here and instead try to see if it
> is still reasonable to propose it for Geode maybe relaxing a bit the
> expectations and clarifying some things.
>
> I agree that rolling downgrade is a useful feature for some cases. I also
> agree we would need to add a lot of tests to make sure we really can
> support it. I'd love to hear what others think about whether this feature
> is worth the overhead of making sure downgrades can always work. As Bruce
> pointed out, we have made changes in the past and we will make changes in
> the future that may need additional logic to support downgrades.
>
> Regarding your downgrade steps, they look reasonable. You might consider
> downgrading the servers first. Rolling *upgrade* upgrades the locators
> first, so up to this point we have only tested a newer locator with an
> older server.
>
> -Dan
>
> On Mon, Apr 20, 2020 at 9:13 AM  wrote:
>
> > Hi,
> >
> > I agree that if we wanted to support limited rolling downgrade some other
> > version interchange needs to be done and extra tests will be required.
> >
> > Nevertheless, this could be done using gfsh or with a startup parameter.
> > For example, in the case you mentioned about the UDP messaging, some
> > command like: "enable UDP messaging" to put the system again in a state
> > equivalent to "upgrade in progress but not yet completed" that would
> allow
> > old members to join again.
> > I guess for each case there would be particularities but they should not
> > involve a lot of effort because most of the mechanisms needed (the ones
> > that allow old and new members to coexist) will have been developed for
> the
> > rolling upgrade.
> >
> > Anyhow, we wonder what would be as of today the recommended or official
> > way to downgrade a Geode system without downtime and data loss?
> >
> >
> > 
> > From: Bruce Schuchardt 
> > Sent: Friday, April 17, 2020 11:36 PM
> > To: dev@geode.apache.org 
> > Subject: Re: About Geode rolling downgrade
> >
> > Hi Alberto,
> >
> > I think that if we want to support limited rolling downgrade some other
> > version interchange needs to be done and there need to be tests that
> prove
> > that the downgrade works.  That would let us document which versions are
> > compatible for a downgrade and enforce that no-one attempts it between
> > incompatible versions.
> >
> > For instance, there is work going on right now that introduces
> > communications changes to remove UDP messaging.  Once rolling upgrade
> > completes it will shut down unsecure UDP communications.  At that point
> > there is no way to go back.  If you tried it the old servers would try to
> > communicate with UDP but the new servers would not have UDP sockets open
> > for security reasons.
> >
> > As a side note, clients would all have to be rolled back before starting
> > in on the servers.  Clients aren't equipped to talk to an older version
> > server, and servers will reject the client's attempts to create
> connections.
> >
> > On 4/17/20, 10:14 AM, "Alberto Gomez"  wrote:
> >
> > Hi Bruce,
> >
> > Thanks a lot for your answer. We had not thought about the changes in
> > distributed algorithms when analyzing rolling downgrades.
> >
> > Rolling downgrade is a pretty important requirement for our customers
> > so we would not like to close the discussion here and instead try to see
> if
> > it is still reasonable to propose it for Geode maybe relaxing a bit the
> > expectations and clarifying some things.
> >
> > First, I think supporting rolling downgrade does not 

Re: Checking for a member is still part of distributed system

2020-04-17 Thread Anilkumar Gingade
Thanks Bruce.
Will take a look at "WaitForViewInstallation".

-Anil.






On Fri, Apr 17, 2020 at 3:44 PM Anilkumar Gingade 
wrote:

> Thanks Kirk.
> This is for PR clear; I ended up registering/adding a new membership
> listener on DistributionManager (DM).
>
> I was trying to take advantage of MembershipListener on PR region-advisor.
> It turns out that this gets called even before the view is updated on DM.
>
> -Anil
>
> On Fri, Apr 17, 2020 at 3:36 PM Kirk Lund  wrote:
>
>> Any requirements for this to be a User API vs internal API?
>>
>> For internal APIs, you can register a MembershipListener on
>> DistributionManager -- at least one flavor of which returns a
>> Set of current members which you could check
>> before relying on callbacks.
>>
>> On Fri, Apr 17, 2020 at 3:03 PM Anilkumar Gingade 
>> wrote:
>>
>> > Is there a better way to know if a member has left the distributed
>> system,
>> > than following:
>> > I am checking using:
>> >
>> "partitionedRegion.getDistributionManager().isCurrentMember(requester));"
>> >
>> > This returns true, even though the AdvisorListener on
>> > ParitionedRegion already processed memberDeparted() event.
>> >
>> > I want to know if a member has left after invoking the
>> membershipListener.
>> >
>> > -Anil.
>> >
>>
>


Re: Checking for a member is still part of distributed system

2020-04-17 Thread Anilkumar Gingade
Thanks Kirk.
This is for PR clear; I ended up registering/adding a new membership
listener on DistributionManager (DM).

I was trying to take advantage of MembershipListener on PR region-advisor.
It turns out that this gets called even before the view is updated on DM.

-Anil

On Fri, Apr 17, 2020 at 3:36 PM Kirk Lund  wrote:

> Any requirements for this to be a User API vs internal API?
>
> For internal APIs, you can register a MembershipListener on
> DistributionManager -- at least one flavor of which returns a
> Set of current members which you could check
> before relying on callbacks.
>
> On Fri, Apr 17, 2020 at 3:03 PM Anilkumar Gingade 
> wrote:
>
> > Is there a better way to know if a member has left the distributed
> system,
> > than following:
> > I am checking using:
> > "partitionedRegion.getDistributionManager().isCurrentMember(requester));"
> >
> > This returns true, even though the AdvisorListener on
> > ParitionedRegion already processed memberDeparted() event.
> >
> > I want to know if a member has left after invoking the
> membershipListener.
> >
> > -Anil.
> >
>


Checking for a member is still part of distributed system

2020-04-17 Thread Anilkumar Gingade
Is there a better way to know if a member has left the distributed system,
than following:
I am checking using:
"partitionedRegion.getDistributionManager().isCurrentMember(requester));"

This returns true, even though the AdvisorListener on
ParitionedRegion already processed memberDeparted() event.

I want to know if a member has left after invoking the membershipListener.

-Anil.


Re: Data ingestion with predefined buckets

2020-04-16 Thread Anilkumar Gingade
>> PutAllPRMessage.*

These are internal APIs/message protocols used to handle PartitionedRegin
messages.
The messages are sent from originator node to peer nodes to operate on a
given partitioned region; not intended as application APIs.

We could consider, looking at the code, which determines bucket-id for each
of putAll keys. If there is routing info that identifies a common data
store (bucket); the code could be optimized there...

My recommendation is still using the existing APIs and trying to tune the
putAll map size. By reducing the map size, you will be pushing small chunks
of data to the server, while remaining data is acted upon (at client);
which keeps both client and server busy at the same time. You can also look
at tuning socket buffer size, to fit your data size so that the data is
written/read in a single chunk.

-Anil


On Wed, Apr 15, 2020 at 7:01 PM steve mathew 
wrote:

> Anil, yes its a kind of custom hash (which involves calculating hash on all
> fields of row). Have to stick to the predefined mechanism based on which
> source files are generated.
>
> It would be great help if some-one guide me about any available
> *server-side
> internal API that provides bucket level data-ingestion if any*. While
> exploring came across "*PartitionRegion.sendMsgByBucket(bucketId,
> PutAllPRMessage)*"..Can this API internally takes care of redundancy
> (ingestion into secondary buckets on peer nodes)..?
>
> Can someone explain about
> *PutAllPRMessage.operateOnPartitionedRegion(ClusterDistributionManager
> dm, PartitionedRegion pr,..)*, it seems this handles putAll msg from peer..
> When is this required..?
>
> Thanks
>
> Steve M.
>
> On Wed, Apr 15, 2020 at 11:06 PM Anilkumar Gingade 
> wrote:
>
> > About api: I would not recommend using bucketId in api, as it is internal
> > and there are other internal/external apis that rely on bucket id
> > calculations; which could be compromised here.
> >
> > Instead of adding new APIs, probably looking at minimizing/reducing the
> > time spent may be a good start.
> >
> > BucketRegin.waitUntilLocked - A putAll thread could spend time here, when
> > there are multiple threads acting upon the same thread; one way to reduce
> > this is by tuning the putall size, can you try changing our putall size
> > (say start with 100).
> >
> > I am wondering about the time spent in hashcode(); is it a custom code?
> >
> > If you want to create the buckets upfront, you can try calling the
> method:
> > PartitionRegionHelper.assignBucketsToPartitions().
> >
> > -Anil
> >
> >
> > On Wed, Apr 15, 2020 at 8:37 AM steve mathew 
> > wrote:
> >
> > > Thanks Den, Anil and Udo for your inputs. Extremely sorry for late rely
> > as
> > > I took bit of time to explore and understand geode internals.
> > >
> > > It seems BucketRegion/Bucket terminology is not exposed to user but
> > still i
> > > am trying to achieve something that is uncommon and for which client
> API
> > is
> > > not exposed.
> > >
> > > *Details about Use-case/Client *
> > > - MultiThreadClient - Each task perform data-ingestion on specific
> > bucket.
> > > Each task knows the bucket number to ingest data. In-short client knows
> > > task-->bucket mapping.
> > > - Each task iteratively ingest-data into batch (configurable) of 1000
> > > records to the bucket assigned to it.
> > > - Parallelism is achieved by running multiple tasks concurrently.
> > >
> > >
> > > *When i tried with exisitng R.putAll() API, observed slow performance
> and
> > > related observations are* - Few tasks takes quite a longer time
> > (ThreaDump
> > > shows--> Thread WAITING on BucketRegin.waitUntilLocked), hence overall
> > > client takes longer time.
> > >  - Code profiling shows good amount of time spent during hash-code
> > > calculation. It seems key.hashCode() gets calculated in on both client
> > and
> > > server, which is not required for my use-case as task-->bucket mapping
> > > known before.
> > >  - putAll() client implementation takes care of Parallelism (using
> > > PRMetadata enabled thread-pool and reshuffle the keys internally), but
> in
> > > my-case that's taken care by multiple tasks each per buckrt within my
> > > client.
> > >
> > > *I have forked the Geode codebase and trying to extend it by providing
> a
> > > client API like, *
> > > //Region.java
> > > /**
> > >  * putAll records in specified bucket
> > >  */
> &g

Re: Data ingestion with predefined buckets

2020-04-15 Thread Anilkumar Gingade
About api: I would not recommend using bucketId in api, as it is internal
and there are other internal/external apis that rely on bucket id
calculations; which could be compromised here.

Instead of adding new APIs, probably looking at minimizing/reducing the
time spent may be a good start.

BucketRegin.waitUntilLocked - A putAll thread could spend time here, when
there are multiple threads acting upon the same thread; one way to reduce
this is by tuning the putall size, can you try changing our putall size
(say start with 100).

I am wondering about the time spent in hashcode(); is it a custom code?

If you want to create the buckets upfront, you can try calling the method:
PartitionRegionHelper.assignBucketsToPartitions().

-Anil


On Wed, Apr 15, 2020 at 8:37 AM steve mathew 
wrote:

> Thanks Den, Anil and Udo for your inputs. Extremely sorry for late rely as
> I took bit of time to explore and understand geode internals.
>
> It seems BucketRegion/Bucket terminology is not exposed to user but still i
> am trying to achieve something that is uncommon and for which client API is
> not exposed.
>
> *Details about Use-case/Client *
> - MultiThreadClient - Each task perform data-ingestion on specific bucket.
> Each task knows the bucket number to ingest data. In-short client knows
> task-->bucket mapping.
> - Each task iteratively ingest-data into batch (configurable) of 1000
> records to the bucket assigned to it.
> - Parallelism is achieved by running multiple tasks concurrently.
>
>
> *When i tried with exisitng R.putAll() API, observed slow performance and
> related observations are* - Few tasks takes quite a longer time (ThreaDump
> shows--> Thread WAITING on BucketRegin.waitUntilLocked), hence overall
> client takes longer time.
>  - Code profiling shows good amount of time spent during hash-code
> calculation. It seems key.hashCode() gets calculated in on both client and
> server, which is not required for my use-case as task-->bucket mapping
> known before.
>  - putAll() client implementation takes care of Parallelism (using
> PRMetadata enabled thread-pool and reshuffle the keys internally), but in
> my-case that's taken care by multiple tasks each per buckrt within my
> client.
>
> *I have forked the Geode codebase and trying to extend it by providing a
> client API like, *
> //Region.java
> /**
>  * putAll records in specified bucket
>  */
> *public void putAll(int bucketId, map) *
>
> Already added client side message and related code (similar to putAllOp and
> its impl) , I am adding server-side code/BaseCommand, similar to putAll
> code-path (cmdExecute()/virtualPut() etc),* is there any API (internal)
> that provides bucket specific putAll and take care of redundancy -
> secondary bucket ingestion - as well and i can use/hook it directly ..?*
>
> It seems, if i isolate bucket-creation and actual put flow (create bucket
> prior to putAll call) it may work better in my scenario, hence
> *Is there any (recommendations) to create buckets explicitly prior to
> actual PUT and not within putAll flow lazily on actual PUT. Is there any
> internal API available for this, that can be used or other means like FE
> etc?*
>
> *Data processing/retrieval :  *I am not going to use get/getAll API but
> will process the data using FE and Querying mechanism once achieve bucket
> specific ingestion.
>
>
> *Overall thoughts on this API impl. ..?*
>
> Looking forward to the inputs..
> Thanks in advance.
>
> *Steve M*
>
>
>
>
>
> On Sat, Apr 11, 2020 at 7:12 PM Udo Kohlmeyer 
> wrote:
>
> > Hi there Steve,
> >
> > Firstly, you are correct, the pattern you are describing is not
> > recommended and possibly not even correctly supported. I've seen many
> > implementations of Geode systems and none of them ever needed to do what
> > you are intending to do. Seems like you are will to go through A LOT of
> > effort for a benefit I don't immediately realize.
> >
> > Also, I'm confused on what part of the "hashing" you are trying to avoid.
> > You will ALWAYS  have the hashing overhead. At the very least the key
> will
> > have to be hashed for put() and later on get().
> > As for the "file-per-bucket" request, there will always be some form of
> > bucket resolution that needs to happen. Be it a custom
> PartitionResolution
> > or default partition bucket resolver.
> >
> > In the code that Dan provided, you now have to manage the bucket number
> > explicitly in the client. When you insert data, you have to provide the
> > correct bucket number and if you retrieve the data, you have to provide
> the
> > correct bucket number, otherwise you will get "null" back. So this means
> > your client has to manage the bucket numbers. Because every subsequent
> > put/get that does not provide the bucket number, will possibly result in
> > some failure. In short, EVERY key operation (put/get) will require a
> > bucketNumber to function correctly, as the PartitionResolver is used.
> >
> > Maybe we can aid you better in suitable 

Re: Data ingestion with predefined buckets

2020-04-10 Thread Anilkumar Gingade
Did you look into:"StringPrefixPartitionResolver" which doesn't need custom
implementation.
https://geode.apache.org/docs/guide/111/developing/partitioned_regions/standard_custom_partitioning.html

You can try key like - "key | file1"

-Anil.


On Fri, Apr 10, 2020 at 4:02 PM Dan Smith  wrote:

> Hi Steve,
>
> Well, you can technically use more than just the key in your partition
> resolver. You can also use a callback argument, something like the below
> code. This would put all of your data into bucket 0.  The issue is that all
> operations will have to pass the callback argument, so if you need to do a
> get you will also need to pass a callback argument to do a get from the
> correct bucket.
>
> int callbackArgument = 0;
> region.putAll(hashmap_with_your_data, callbackArgument)
>
> class MyPartitionResolver implements PartitionResolver {
>   Object getRoutingObject(EntryOperation opDetails) {
>   return opDetails.getCallbackArgument().
>   }
> }
>
>
> -Dan
>
> On Fri, Apr 10, 2020 at 3:52 PM steve mathew 
> wrote:
>
> > Thanks Dan for your quick response.
> >
> > Though, This may not be a recommended pattern, Here, I am targeting a
> > bucket specific putAll and want to exclude hashing as it turn out as an
> > overhead in my scenario.
> > Is this achievable...? How should I define a PartionResolver that works
> > generically and returns a respective bucket for specific file.
> > What will get impacted if I opt this route (Fix partitioning per file),
> can
> > think of horizontal scalability as buckets made fix .. thoughts?
> >
> >
> > -Steave M.
> >
> >
> > On Sat, Apr 11, 2020, 1:54 AM Dan Smith  wrote:
> >
> > > Hi Steve,
> > >
> > > The bucket that data goes into is generally determined by the key. So
> for
> > > example if your data in File-0 is all for customer X, you can include
> > > Customer X in your region key and implement a PartitionResolver that
> > > extracts the customer from your region key and returns it. Geode will
> > then
> > > group all of the data for Customer X into a single bucket.
> > >
> > > You generally shouldn't have to target a specific bucket number (eg
> > bucket
> > > 0). But technically you can just by returning an integer from your
> > > PartitionResolver. If you return the integer 0, your data will go into
> > > bucket 0. Usually it's just better to return your partition key (eg
> > > "Customer X") and let geode hash that to some bucket number.
> > >
> > > -Dan
> > >
> > > On Fri, Apr 10, 2020 at 11:04 AM steve mathew <
> steve.mathe...@gmail.com>
> > > wrote:
> > >
> > > > Hello Geode devs and users,
> > > >
> > > > I have a set of files populated with data, fairly distributed, I want
> > to
> > > > put each file's data in a specific bucket,
> > > > like PutAll File-0 data into Geode bucket B0
> > > >   PutAll File-1 data into Geode bucket B1
> > > >
> > > >   and so on...
> > > >
> > > > How can i achieve this using geode client...?
> > > >
> > > > Can i achieve this using PartitonResolver or some other means...?
> > > >
> > > > Thanks in advance
> > > >
> > > > -Steve M.
> > > >
> > >
> >
>


Re: Data ingestion with predefined buckets

2020-04-10 Thread Anilkumar Gingade
Yes, you can use partition resolver to achieve this.
You can also look into "StringPrefixPartitionResolver" which doesn't need
custom implementation.
https://geode.apache.org/docs/guide/111/developing/partitioned_regions/standard_custom_partitioning.html

-Anil


On Fri, Apr 10, 2020 at 11:08 AM steve mathew 
wrote:

> Hello Geode devs and users,
>
> I have a set of files populated with data, fairly distributed, I want to
> put each file's data in a specific bucket,
> like PutAll File-0 data into Geode bucket B0
>   PutAll File-1 data into Geode bucket B1
>
>   and so on...
>
> How can i achieve this using geode client...?
>
> Can i achieve this using PartitonResolver or some other means...?
>
> Thanks in advance
>
> -Steve M.
>


Re: Proposal to bring GEODE-7970 to support/1.12

2020-04-10 Thread Anilkumar Gingade
+1
Based on: The risk is low. Avoids false positives in automated
vulnerability scans.

On Fri, Apr 10, 2020 at 12:33 PM Dick Cavender  wrote:

> +1
>
> On Fri, Apr 10, 2020 at 11:16 AM Owen Nichols  wrote:
>
> > Recently it’s been noticed that spring-core-5.2.1.RELEASE.jar is getting
> > flagged for “high" security vulnerability CVE-2020-5398.
> >
> > Analysis shows that Geode does not use Spring in a manner that would
> > expose this vulnerability (none of our REST apis or pulse set a
> > Content-Disposition header derived from user-supplied input).
> >
> > The risk of bringing GEODE-7970 is low.  This patch update from 5.2.1 to
> > 5.2.5 brings bug fixes only.  This exact version was on develop from Apr
> 8
> > - Apr 10 & passed all tests.
> >
> > This fix is critical to avoid false positives in automated vulnerability
> > scans.
> >
> > -Owen
>


Re: [PROPOSAL]: Include GEODE-7832, GEODE-7853 & GEODE-7863 in Geode 1.12.0

2020-03-19 Thread Anilkumar Gingade
+1 The changes and the risk looks minimal.

On Thu, Mar 19, 2020 at 2:16 AM Alberto Bustamante Reyes
 wrote:

> +1
> 
> De: Donal Evans 
> Enviado: jueves, 19 de marzo de 2020 2:14
> Para: dev@geode.apache.org 
> Asunto: Re: [PROPOSAL]: Include GEODE-7832, GEODE-7853 & GEODE-7863 in
> Geode 1.12.0
>
> +1
>
> On Wed, Mar 18, 2020 at 4:53 PM Owen Nichols  wrote:
>
> > +3
> >
> > > On Mar 18, 2020, at 4:52 PM, Ju@N  wrote:
> > >
> > > Hello devs,
> > >
> > > I'd like to propose including the fixes for *GEODE-7832 [1]*,
> *GEODE-7853
> > > [2]* and *GEODE-7863 [3]* in release 1.12.0.
> > > All the changes are related to the work we have been doing in order to
> > > bring the performance closer to the baseline (*Geode 1.10*), we are not
> > > quite there yet but it would be good to include these fixes into the
> > > release anyways.
> > > Best regards.
> > >
> > > [1]: https://issues.apache.org/jira/browse/GEODE-7832
> > > [2]: https://issues.apache.org/jira/browse/GEODE-7853
> > > [3]: https://issues.apache.org/jira/browse/GEODE-7863
> > >
> > > --
> > > Ju@N
> > > --
> > > Ju@N
> >
> >
>


Re: Tips on using AsyncInvocation in DUnit Tests

2020-03-18 Thread Anilkumar Gingade
Thanks Kirk. Can you add an example here...

On Wed, Mar 18, 2020 at 11:12 AM Kirk Lund  wrote:

> Tips on using AsyncInvocation:
>
> * Always use await() or get()
> * Both check and throw any remote exceptions
> * Both use GeodeAwaitility Timeout and will throw TimeoutException if it’s
> exceeded
> * Use await() for Void types and get() when expecting a non-null value
>
> Recent improvements:
>
> Timeout now gets a remote stack trace to use as the cause and dumps stack
> traces for that JVM’s threads.
>
> You can also declare your instance of AsyncInvocation as a Future and
> simply use the standard Java API for Futures. This basically means the test
> will invoke get() for both Void and non-Void types.
>
> AsyncInvocation handles everything for you when you invoke await() or get()
> -- there is no need to invoke any of the deprecated APIs on
> AsyncInvocation:
> * Both use the GeodeAwaitility Timeout and throw TImeoutException
> * If Timeout occurs, AsyncInvocation will use the remote stack trace of the
> stuck thread as the cause and it will also print all threads stacks for
> that DUnit VM to facilitate debugging
> * Both will check for a remote failure and rethrow it
>


Re: [PROPOSAL] eliminate file count loophole in PR StressNewTest

2020-03-03 Thread Anilkumar Gingade
The stress test is to identify the flaky-ness within the test; and assuming
any changes to the test may have introduced the flaky-ness.
It's about paying the cost upfront or later when the test is determined to
be flaky.
If 25+ tests have been changed in a PR, the cost of running stress test for
those;  and gating the PR for so long.
Knowing how much pain it causes to fix a flaky test after a certain/long
duration of time; I am +1 for doing this change.

On Tue, Mar 3, 2020 at 10:06 AM Dan Smith  wrote:

> What's the current timeout for StressNewTest? Maybe if we just up the
> threshold to 100 tests or so and up the timeout to match we can catch
> pretty much all PRs.
>
> I'm not sure why the job is flagging more tests than it should. It looks
> like at some point @rhoughon changed it to read the merge base from some
> file created by concourse as an optimization [1] - I suspect maybe that
> file is inaccurate?
>
> I originally wrote this job. It's definitely not a panacea, it will only
> catch a new flaky test if
>  - the test is really flaky (likely to fail more than 1/50 times)
>  - the change actually happened in the test file itself, and not the
> product or some other test file.
>
> [1]
>
> https://github.com/apache/geode/commit/4c06ba4625e69d44a5165aa9f2fccddfc064de87
>
> -Dan
>
> On Sun, Mar 1, 2020 at 9:00 PM Owen Nichols  wrote:
>
> > We don’t tend to look too closely at successful PR checks to see whether
> > they actually checked anything at all.
> >
> > One example I found is
> >
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-pr/jobs/StressNewTestOpenJDK11/builds/5957
> > <
> >
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-pr/jobs/StressNewTestOpenJDK11/builds/5957
> > >:
> > 32 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.
> >
> > Here are 92 more examples (url’s omitted for brevity — use the example
> > above as a template and just replace the last 4 digits):
> > 26 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 6243)
> > 26 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 6249)
> > 26 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 6402)
> > 27 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 6262)
> > 27 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 6430)
> > 27 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 6439)
> > 27 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 6449)
> > 27 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 6454)
> > 27 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 6458)
> > 27 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 6459)
> > 28 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 6224)
> > 28 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 6441)
> > 28 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 6448)
> > 28 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 6452)
> > 29 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 6102)
> > 29 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 6177)
> > 30 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 5939)
> > 30 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 5940)
> > 30 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 5949)
> > 30 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 6473)
> > 31 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 5953)
> > 31 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 6187)
> > 31 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 6470)
> > 31 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 6471)
> > 31 is too many changed tests to stress test. Allowing this job to pass
> > without stress testing.  (build 6474)
> > 31 is too many changed tests to stress test. Allowing 

Re: [DISCUSS] include geode-benchmarks in 1.12 release

2020-01-17 Thread Anilkumar Gingade
+1 to include the performance benchmark code. It provides an
opportunity for community to use it and develop on it (a must needed when
Geode is termed as performant data product).



On Thu, Jan 16, 2020 at 6:35 PM Robert Houghton 
wrote:

> Let's not vote until there is a call to vote, folks...
>
>
>
> On Thu, Jan 16, 2020, 18:31 Jacob Barrett  wrote:
>
> > I would characterize my vote as 0. I really don’t care either way. Just
> > sharing I think they have no value in a release.
> >
> > > On Jan 16, 2020, at 6:08 PM, Owen Nichols  wrote:
> > >
> > > Geode PMC has 52 members.  If this were a vote, it looks like the
> > results would have been:
> > > +1: 2 (Anthony, Dan)
> > > -1: 1 (Jake)
> > >
> > > If the next release manager were to go ahead and put geode-benchmarks
> in
> > the Geode 1.12.0 source release, at least 3 PMC members would need to be
> > willing to vote +1.  So it sounds like we need a few more of the other 49
> > PMC members to weigh in on this discussion.
> > >
> > > To summarize so far:
> > >
> > > Proposal:
> > > - add a geode-benchmarks-n.n.n-src.tgz artifact to all Geode releases
> > going forward, starting with 1.12.0
> > >
> > > Arguments in favor:
> > > - why not?
> > > - it’s already public
> > > - we should default to including all things
> > > - it might be of interest to the user community
> > > - it might encourage contributions back to further improve it
> > > - it is required by CI, which is already included
> > > - Apache mandates that source releases must include test code too
> > >
> > > Arguments against:
> > > - doing nothing is less work
> > > - it will burden PMC members with additional work to validate and vote
> > on RCs
> > > - nobody outside the dev community has asked for it to be included
> > > - maybe it’s not ready
> > > - maybe it’s not documented well enough
> > > - it’s not needed to use Geode
> > > - Apache's legal separation between dev stuff and public release stuff
> > > - legal or license review may be not have been conducted yet
> > >
> > >
> > >>> On Jan 16, 2020, at 4:48 PM, Dan Smith  wrote:
> > >>>
> > >>> If geode-benchmarks is included, that implies that an RC cannot be
> > >> approved until reviewers can successfully run the benchmark suite from
> > the
> > >> geode-benchmarks source distribution.  Is that what we want?
> > >>
> > >> I think it would be sufficient to run the tests of the benchmarks, eg
> > >> ./gradlew test
> > >>
> > >>> Deploying CI pipelines and running Benchmarks seems like a prime
> > example
> > >> of things we’d be happy to help others in the community with on the
> dev
> > >> list — but not something we would expect questions about on the user
> > list.
> > >>
> > >> I think it would be valuable to share our benchmarks with the geode
> user
> > >> community. The benchmark framework itself (the harness module) is a
> > fairly
> > >> generic benchmarking framework than can be used to benchmark anything
> > that
> > >> can be spun up using java. The geode-benchmark module has geode
> > benchmarks
> > >> that could be used for testing specific hardware, for example.
> > >>
> > >> -Dan
> > >>
> > >>> On Thu, Jan 16, 2020 at 12:37 PM Owen Nichols 
> > wrote:
> > >>>
> > >>> When voting on RC candidates, PMC members "are required to download
> the
> > >>> signed source code package, compile it as provided, and test the
> > resulting
> > >>> executable on their own platform”.
> > >>>
> > >>> If geode-benchmarks is included, that implies that an RC cannot be
> > >>> approved until reviewers can successfully run the benchmark suite
> from
> > the
> > >>> geode-benchmarks source distribution.  Is that what we want?
> > >>>
> > >>> Similarly, if CI is included, that seems to imply that an RC cannot
> be
> > >>> approved until reviewers can stand up their own pipeline from the
> > geode/ci
> > >>> source distribution.  Is that what we want?
> > >>>
> > >>> So far there doesn’t seem to be consensus on what to include in a
> Geode
> > >>> source release, but let’s keep in mind that anything we add to the
> > release
> > >>> becomes an Act Of The Foundation and is held to a higher standard.
> > Apache
> > >>> makes a clear distinction between between development activity and
> > official
> > >>> releases to the public.  Development activity is anything that should
> > stay
> > >>> within the dev list.  Deploying CI pipelines and running Benchmarks
> > seems
> > >>> like a prime example of things we’d be happy to help others in the
> > >>> community with on the dev list — but not something we would expect
> > >>> questions about on the user list.
> > >>>
> >  On Jan 16, 2020, at 10:23 AM, Dan Smith  wrote:
> > 
> >  We are supposed to be including all of the source necessary to test
> > Geode
> >  in the source release [1] - I think that would include benchmarks as
> > >>> well.
> > 
> >  I don't really see any compelling reason *not* to include the
> > benchmarks,
> >  let's go ahead and get them into our 

Re: [DISCUSS] abandon branch protection rules

2019-12-27 Thread Anilkumar Gingade
I would like to keep as is...In my opinion this should not been seen as
policing; rather a concerted effort towards keeping the code stable. And
way to isolate the problem sooner than later (after merging of multiple
PRs, which will make it harder). Yes, I agree it may be annoying to sit on
code change which doesn't look like related to the CI failures; but it will
help to work as a team to address the failure getting reported.

On Fri, Dec 27, 2019 at 3:05 PM Jason Huynh  wrote:

> Just to add more flavor to my previous response... I currently have a PR
> open that modified a method signature that touched a few WAN tests.  It was
> a simple change, removing an unused parameter.  StressNewTest failed and I
> had to spend another day figuring out 10 or so different failures.  A waste
> of time?  Maybe..  At first, I wasn't going to continue, but after trying a
> few things, it looks like the tests installed a listener that was hampering
> other tests.  At the end (soon once it gets reviewed/merged), we end up
> with a Green PR and hopefully have unblocked others on these specific tests
> in the future.
>
> On Fri, Dec 27, 2019 at 2:58 PM Jason Huynh  wrote:
>
> > I feel the frustration at times, but I do also think the ci/pipelines are
> > improving, breaking less often.  I'm ok with the way things are for the
> > moment
> >
> > On Fri, Dec 27, 2019 at 1:47 PM Owen Nichols 
> wrote:
> >
> >> In October we agreed to require at least 1 reviewer and 4 passing PR
> >> checks before a PR can be merged.  Now that we’re tried it for a few
> >> months, do we like it?
> >>
> >> I saw some strong opinions on the dev list recently:
> >>
> >> > Changes to the infrastructure to flat out prevent things that should
> be
> >> self policing is annoying. This PR review lock we have had already cost
> us
> >> valuable time waiting for PR pipelines to pass that have no relevance to
> >> the commit, like CI work. I hate to see process enforced that keeps us
> from
> >> getting work done when necessary.
> >>
> >>
> >> and
> >>
> >> > I think we're getting more and more bureaucratic in our process and
> >> that it stifles productivity.  I was recently forced to spend three days
> >> fixing tests in which I had changed an import statement before they
> would
> >> pass stress testing.  I'm glad the tests now pass reliably but I was
> very
> >> frustrated by the process.
> >>
> >>
> >> Just wondering if others feel the same way.  Is it time to make some
> >> changes?
> >>
> >> -Owen
> >
> >
>


Re: WAN replication issue in cloud native environments

2019-12-06 Thread Anilkumar Gingade
Alberto,

Can you please file a JIRA ticket for this. This could come up often as
more and more deployments move to K8s.

-Anil.


On Fri, Dec 6, 2019 at 8:33 AM Sai Boorlagadda 
wrote:

> > if one gw receiver stops, the locator will publish to any remote locator
> that there are no receivers up.
>
> I am not sure if locators proactively update remote locators about change
> in receivers list rather I think the senders figures this out on connection
> issues.
> But I see the problem that local-site locators have only one member in the
> list of receivers that they maintain as all receivers register with a
> single  address.
>
> One idea I had earlier is to statically set receivers list to locators
> (just like remote-locators property) which are exchanged with gw-senders.
> This way we can introduce a boolean flag to turn off wan discovery and use
> the statically configured addresses. This can be also useful for
> remote-locators if they are behind a service.
>
> Sai
>
> On Thu, Dec 5, 2019 at 2:33 AM Alberto Bustamante Reyes
>  wrote:
>
> > Thanks Charlie, but the issue is not about connectivity. Summarizing the
> > issue, the problem is that if you have two or more gw receivers that are
> > started with the same value of "hostname-for-senders", "start-port" and
> > "end-port" (being "start-port" and "end-port" equal) parameters, if one
> gw
> > receiver stops, the locator will publish to any remote locator that there
> > are no receivers up.
> >
> > And this use case is likely to happen on cloud-native environments, as
> > described.
> >
> > BR/
> >
> > Alberto B.
> > 
> > De: Charlie Black 
> > Enviado: miércoles, 4 de diciembre de 2019 18:11
> > Para: dev@geode.apache.org 
> > Asunto: Re: WAN replication issue in cloud native environments
> >
> > Alberto,
> >
> > Something else to think about SNI based routing.   I believe Mario might
> be
> > working on adding SNI to Geode - he at least had a proposal that he
> > e-mailed out.
> >
> > Basics are the destination host is in the SNI field and the proxy can
> > inspect and route the request to the right service instance. Plus we
> > have the option to not terminate the SSL at the proxy.
> >
> > Full disclosure - I haven't tried out SNI based routing myself and it is
> > something that I thought could work as I was reading about it.   From the
> > whiteboard I have done I think this will do ingress and egress just fine.
> > Potentially easier then port mapping and `hostname for clients` playing
> > around.
> >
> > Just something to think about.
> >
> > Charlie
> >
> >
> > On Wed, Dec 4, 2019 at 3:19 AM Alberto Bustamante Reyes
> >  wrote:
> >
> > > Hi Jacob,
> > >
> > > Yes,we are using LoadBalancer service type. But note the problem is not
> > > the transport layer but on Geode as GW senders are complaining
> > > “sender-2-parallel : Could not connect due to: There are no active
> > > servers.” when one of the servers in the receiving cluster is killed.
> > >
> > > So, there is still one server alive in the receiving cluster but GW
> > sender
> > > does not know it and the locator is not able to inform about its
> > existence.
> > > Looking at the code it seems internal data structures (maps) holding
> the
> > > profiles use object whose equality check relies only on hostname and
> > port.
> > > This makes it impossible to differentiate servers when the same
> > > “hostname-for-senders” and port are used. When the killed server comes
> > back
> > > up, the locator profiles are updated (internal map back to size()=1
> > > although 2+ servers are there) and GW senders happily reconnect.
> > >
> > > The solution with the Geode as-is would be to expose each GW receiver
> on
> > a
> > > different port outside of k8s cluster, this includes creating N
> > Kubernetes
> > > services for N GW receivers in addition to updating the service mesh
> > > configuration (if it is used, firewalls etc…). Declarative nature of
> > > kubernetes means we must know the ports in advance hence start-port and
> > > end-port when creating each GW receiver must be equal and we should
> have
> > > some well-known
> > > algorithm when creating GW receivers across servers. For example:
> > server-0
> > > port 5000, server-1 port 5001, server-2 port 5002 etc…. So, all GW
> > > receivers must be wired individually and we must turn off Geode’s
> random
> > > port allocation.
> > >
> > > But we are exploring the possibility for Geode to handle this
> > cloud-native
> > > configuration a bit better. Locators should be capable of holding GW
> > > receiver information although they are hidden behind same hostname and
> > port.
> > > This is a code change in Geode and we would like to have community
> > opinion
> > > on it.
> > >
> > > Some obvious impacts with the legacy behavior would be when locator
> picks
> > > a server on behalf of the client (GW sender in this case) it does so
> > based
> > >  on the server load. When sender connects and considering 

Re: [VOTE] Release candidate for Apache Geode version 1.11.0.RC3.

2019-12-05 Thread Anilkumar Gingade
Trying to get a conclusion out of it:
- The SDG/STDG to address the issue by changing the code on its part
- Create JIRA ticket for the issue raised. And prioritize/work the issue in
coming GEODE release.

-Anil.


On Thu, Dec 5, 2019 at 10:12 AM Owen Nichols  wrote:

> > On Dec 4, 2019, at 10:09 PM, Jacob Barrett  wrote:
> >
> > I think we can tone down the inflammatory statements
>
> Jake, thank you for speaking up, I felt the same way but wasn’t sure how
> to say it.
>
> This might be a good opportunity for all of us to review the
> https://cwiki.apache.org/confluence/display/GEODE/Code+of+Conduct


Re: [DISCUSS/VOTE] Proposal to bring GEODE-7465 to release/1.11.0

2019-11-26 Thread Anilkumar Gingade
+1

On Tue, Nov 26, 2019 at 11:32 AM Udo Kohlmeyer  wrote:

> This is no-brainer
>
> *+1*
>
> On 11/26/19 11:27 AM, Owen Nichols wrote:
> > I would like to propose bringing “GEODE-7465: Set eventProcessor to null
> in serial AEQ when it is stopped” into the 1.11 release (necessitating an
> RC4).
> >
> > Without the fix, a sequence of ordinary gfsh commands will leave the WAN
> gateway in an unrecoverable hung state:
> > stop gateway-sender
> > start gateway-sender
> > The only recourse is to restart the server.
> >
> > This fix is critical because the distributed system fails to sync data
> between WAN sites as the user would expect.
> > This issue did exist in previous releases, but recent enhancements to
> WAN/AEQ such as AEQ-pause are increasing user interaction with WAN-related
> gfsh commands.
> >
> > The fix is simple, low risk, tested, and has been on develop for 5 days:
> >
> https://github.com/apache/geode/commit/e148cef9cb63eba283cf86bc490eb280023567ce
>


Re: Cache.close is not synchronous?

2019-11-25 Thread Anilkumar Gingade
Looking at the code, the cache.close() and InternalCacheBuilder.create()
are synchronized on "GemFireCacheImpl.class"'; it's the
internalCachebuilder create that seems to be using reference to the old
distributed-system.
The GemFireCacheImpl.getInstance() and getExisting() both perform
"isClosing" check and does early return. The InternalCacheBuilder is new;
not sure if its missing early checks.

-Anil.

On Mon, Nov 25, 2019 at 2:47 PM Mark Hanson  wrote:

> +1 to fix.
>
> > On Nov 25, 2019, at 2:02 PM, John Blum  wrote:
> >
> > +1 ^ 64!
> >
> > I found this out the hard way some time ago and is why STDG exists in the
> > first place (i.e. usability issues, particularly with testing).
> >
> > On Mon, Nov 25, 2019 at 1:41 PM Kirk Lund  wrote:
> >
> >> I found a test that closes the cache and then recreates the cache
> multiple
> >> times with 2 second sleep between each. I tried to remove the
> Thread.sleep
> >> and found that recreating the cache
> >> throws DistributedSystemDisconnectedException (see below).
> >>
> >> This seems like a usability nightmare. Anyone have any ideas WHY it's
> this
> >> way?
> >>
> >> Personally, I want Cache.close() to block until both Cache and
> >> DistributedSystem are closed and the API is ready to create a new Cache.
> >>
> >> org.apache.geode.distributed.DistributedSystemDisconnectedException:
> This
> >> connection to a distributed system has been disconnected.
> >>at
> >>
> >>
> org.apache.geode.distributed.internal.InternalDistributedSystem.checkConnected(InternalDistributedSystem.java:945)
> >>at
> >>
> >>
> org.apache.geode.distributed.internal.InternalDistributedSystem.getDistributionManager(InternalDistributedSystem.java:1665)
> >>at
> >>
> >>
> org.apache.geode.internal.cache.GemFireCacheImpl.(GemFireCacheImpl.java:791)
> >>at
> >>
> >>
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:187)
> >>at
> >>
> >>
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:158)
> >>at
> >> org.apache.geode.cache.CacheFactory.create(CacheFactory.java:142)
> >>
> >
> >
> > --
> > -John
> > john.blum10101 (skype)
>
>


Re: [DISCUSS] add GEODE-7079 to release/1.9.2

2019-10-04 Thread Anilkumar Gingade
+1

On Fri, Oct 4, 2019 at 11:15 AM Juan José Ramos  wrote:

> +1
>
>
>
>
> On Fri, Oct 4, 2019 at 6:39 PM Jens Deppe  wrote:
>
> > On behalf of Juan I'm requesting approval to add GEODE-7079 to
> > release/1.9.2
> >
> > The original justification is:
> >
> > Long story short: GEODE-7079 can be hit by *spring-data-geode* users that
> > restart a member configured with a persistent asynchronous event queue
> > (with conflation enabled) without pausing the event processor. The
> ability
> > to pause the event processor is what we're mainly adding in 1.9.2, that's
> > why I believe this fix should also be included.
> >
> > Thanks
> > --Jens
> >
>
>
> --
> Juan José Ramos Cassella
> Senior Software Engineer
> Email: jra...@pivotal.io
>


Re: [DISCUSS] Logging module separation

2019-09-26 Thread Anilkumar Gingade
Dan, Some reason, cant view the diagram...It doesn't show up...


On Thu, Sep 26, 2019 at 11:52 AM Dan Smith  wrote:

> If you are wondering how this relates to the geode-log4j work that Kirk
> did, the following diagram might help. Basically, he made a geode-log4j
> module that makes log4j-core optional. This geode-logging module allows the
> use of some of our log4j wrapper classes from modules other than geode-core.
>
> [image: image.png]
>
> On Thu, Sep 26, 2019 at 11:26 AM Ernest Burghardt 
> wrote:
>
>> Dear Geode,
>>
>> In support of the Membership
>> <
>> https://cwiki.apache.org/confluence/display/GEODE/Move+membership+code+to+a+separate+gradle+sub-project
>> >
>> modularization efforts, we would like to move some of our logging code
>> that
>> wraps log4j into a separate module is needed in order to break
>> dependencies
>> on geode-core.
>>
>> The proposed module 
>> would be called "geode-logging" and would contain LogService,
>> LoggingThread, LoggingExecutor and related classes.
>>
>> As always, your feedback is welcomed and appreciated.
>>
>> Ernie and Dan
>>
>


Re: [VOTE] Adding a lucene specific fix to release/1.10.0

2019-09-19 Thread Anilkumar Gingade
+1

On Thu, Sep 19, 2019 at 11:02 AM Eric Shu  wrote:

> +1
>
>
> On Thu, Sep 19, 2019 at 10:59 AM Benjamin Ross  wrote:
>
> > +1
> >
> > On Thu, Sep 19, 2019 at 10:50 AM Nabarun Nag  wrote:
> >
> > > +1
> > >
> > > On Thu, Sep 19, 2019 at 10:49 AM Xiaojian Zhou 
> wrote:
> > >
> > > > I want to merge GEODE-7208, which is lucene specific fix
> > > >
> > > > The fix will enable indexing on inherited attributes in user object.
> > > >
> > > > revision 4ec87419d456748a7d853e979c90ad4e301b2405
> > > >
> > > > Regards
> > > > Gester
> > > >
> > >
> >
>


Re: [DISCUSS] Improvements on client function execution API

2019-09-16 Thread Anilkumar Gingade
Alberto,

Sorry for late responseCurrently geode (java client) does provide
ability to set function timeout; but its through internal system property
"gemfire.CLIENT_FUNCTION_TIMEOUT"Some of the tests using this property
are Tests extending "FunctionRetryTestBase".

Since this is through internal system property; we need a cleaner api to
achieve the timeout behavior.

+1 for the option a) proposed.

-Anil




On Mon, Sep 16, 2019 at 9:03 AM Dan Smith  wrote:

> Thanks for following up on this!
>
> -Dan
>
> On Mon, Sep 16, 2019 at 3:07 AM Alberto Gomez 
> wrote:
>
> > Thanks for the feedback. I also give a +1 to option a) including Dan's
> > comments.
> >
> > I'll move the RFC to the Development state and will open a ticket to
> > follow up on the implementation.
> >
> > -Alberto G.
> >
> > On 12/9/19 8:15, Jacob Barrett wrote:
> > > +1
> > >
> > > I echo Dan’s comments as well.
> > >
> > > Thanks for tackling this.
> > >
> > > -jake
> > >
> > >
> > >> On Sep 11, 2019, at 2:36 PM, Dan Smith  wrote:
> > >>
> > >> +1 - Ok, I think I've come around to option (a). We can go head and
> > add a
> > >> new execute(timeout, TimeUnit) method to the java API that is
> blocking.
> > We
> > >> can leave the existing execute() method alone, except for documenting
> > what
> > >> it is doing.
> > >>
> > >> I would like implement execute(timeout,  TimeUnit) on the server side
> as
> > >> well. Since this Execution class is shared between both client and
> > server
> > >> APIs, it would be unfortunate to have a method on Execution that
> simply
> > >> doesn't work on the server side.
> > >>
> > >> -Dan
> > >>
> > >>
> > >>> On Thu, Sep 5, 2019 at 9:25 AM Alberto Gomez  >
> > wrote:
> > >>>
> > >>> Hi all,
> > >>>
> > >>> First of all, thanks a lot Dan and Jacob for your feedback.
> > >>>
> > >>> As we are getting close to the deadline I am adding here some
> > conclusions
> > >>> and a refined proposal in order to get some more feedback and if
> > possible
> > >>> some voting on the two alternatives proposed (or any other in between
> > if
> > >>> you feel any of them is lacking something).
> > >>>
> > >>> I also add some draft code to try to clarify a bit the more complex
> of
> > the
> > >>> alternatives.
> > >>>
> > >>>
> > >>> Proposal summary (needs a decision on which option to implement):
> > >>>
> > >>>
> >
> ---
> > >>>
> > >>> In order to make the API more coherent two alternatives are proposed:
> > >>>
> > >>> a) Remove the timeout from the ResultCollector::getResult() /
> document
> > >>> that the timeout has no effect, taking into account that
> > >>> Execution::execute() is always blocking.
> > >>> Additionally we could add the timeout parameter to the
> > >>> Execution::execute() method of the Java API in order to align it with
> > the
> > >>> native client APIs. This timeout would not be the read timeout on the
> > >>> socket but a timeout for the execution of the operation.
> > >>>
> > >>> b) Change the implementation of the Execution::execute() method
> without
> > >>> timeout to be non-blocking on both the Java and native APIs. This
> > change
> > >>> has backward compatibility implications, would probably bring some
> > >>> performance decrease and could pose some difficulties in the
> > implementation
> > >>> on the C++ side (in the  handling of timed out operations that hold
> > >>> resources).
> > >>>
> > >>>
> > >>> The first option (a) is less risky and does not have impacts
> regarding
> > >>> backward compatibility and performance.
> > >>>
> > >>> The second one (b) is the preferred alternative in terms of the
> > expected
> > >>> behavior from the users of the API. This option is more complex to
> > >>> implement and as mentioned above has performance and backward
> > compatibility
> > >>> issues not easy to be solved.
> > >>>
> > >>> Following is a draft version of the implementation of b) on the Java
> > >>> client:
> > >>>
> > >>>
> >
> https://github.com/Nordix/geode/commit/507a795e34c6083c129bda7e976b9223d1a893da
> > >>>
> > >>> Following is a draft version of the implementation of b) on the C++
> > native
> > >>> client:
> > >>>
> > >>>
> >
> https://github.com/apache/geode-native/commit/a03a56f229bb8d75ee71044cf6196df07f43150d
> > >>>
> > >>> Note that the above implementation of b) in the C++ client implies
> that
> > >>> the Execution object returned by the FunctionService cannot be
> > destroyed
> > >>> until the thread executing the function asynchronously has finished.
> > If the
> > >>> function times out, the Execution object must be kept until the
> thread
> > >>> finishes.
> > >>>
> > >>>
> > >>> Other considerations
> > >>> -
> > >>>
> > >>> * Currently, in the function execution Java client there is not a
> > >>> possibility to set a timeout for the execution of functions. The
> > closest to
> > >>> this is the read timeout that may be set globally for 

Re: [VOTE] Adding new AEQ feature to release/1.10.0

2019-09-13 Thread Anilkumar Gingade
+1. This is needed for spring data-geode; whose upcoming release is based
on older geode version.

-Anil.


On Fri, Sep 13, 2019 at 3:23 PM Nabarun Nag  wrote:

> Hi Geode Community ,
>
> [GEODE-7121]
>
> I would like to include the new feature of creating AEQs with a paused
> event processor to the release 1.10 branch. This also includes the feature
> to resume the AEQ at a later point in time.
> This feature includes addition of new/modified APIs and gfsh commands.
>
> [All details about this feature has been discussed in a previous discuss
> thread]
>
> These are the commits that needs to be in release 1.10.0 branch.
> f6e11084daa30791f7bbf9a8187f6d1bc9c4b91a
> 615d3399d24810126a6d57b5163f7afcd06366f7
> 1440a95e266e671679a623f93865c5e7e683244f
> 42e07dc9054794657acb40c292f3af74b79a1ea6
> e1f200e2f9e77e986d250fde3848dc004b26a7c2
> 5f70160fba08a06c7e1fc48c7099e63dd1a0502b
> 0645446ec626bc351a2c881e4df6a4ae2e75fbfc
> 575c6bac115112df1e84455b052566c75764b0be
> 3d9627ff16443f4aa513a67bcc284e68953aff8a
> ea22e72916f8e34455800d347690e483727f9bf5
> 8d26d595f5fb94ff703116eb91bb747e9ba7f536
>
> Will create a PR ASAP.
>
> Regards
> Nabarun Nag
>


Re: [Proposal] Make gfsh "stop server" command synchronous

2019-09-10 Thread Anilkumar Gingade
Its a good option. But do we see any use-cases, where user doesn't want to
wait for a server stop (if its taking long time) and continue to proceed
with other operation (say executing commands on other servers).
Also, i could not make out how this is related to GEODE-7017; the testcase
seems to be related to starting the server...

-Anil.


On Tue, Sep 10, 2019 at 3:32 PM John Blum  wrote:

> `stop server` is synchronous (with an option to break out of the wait using
> CTRL^C) AFAIR.
>
> Way deep down inside, it simply relies on GemFireCache.close() to return
> (in-process).
>
> As Darrel mentioned, there is not "true" signal the the server was
> successfully stopped.
>
> -j
>
>
> On Tue, Sep 10, 2019 at 3:23 PM Darrel Schneider 
> wrote:
>
> > I think it would be good for stop server to confirm in some way that the
> > server has stopped before returning.
> >
> > On Tue, Sep 10, 2019 at 3:08 PM Mark Hanson  wrote:
> >
> > > Hello All,
> > >
> > > I would like to propose that we make the gfsh “stop server” command
> > > synchronous. It is causing some issues with some tests as the rest of
> the
> > > calls are blocking. Stop on the other hand immediately returns by
> > > comparison.
> > > This causes issues as shown in GEODE-7017 specifically.
> > >
> > > GEODE:7017 CI failure:
> > > org.apache.geode.launchers.ServerStartupValueRecoveryNotificationTest >
> > > startupReportsOnlineOnlyAfterRedundancyRestored
> > > https://issues.apache.org/jira/browse/GEODE-7017 <
> > > https://issues.apache.org/jira/browse/GEODE-7017>
> > >
> > >
> > > What do people think?
> > >
> > > Thanks,
> > > Mark
> >
>
>
> --
> -John
> john.blum10101 (skype)
>


Re: [DISCUSSION] should the DISTRIBUTED_NO_ACK Scope be deprecated?

2019-08-29 Thread Anilkumar Gingade
The use-cases I can think of are edge, mesh, IOT related, large scale
streaming services where data consistency or data loss (for sometime) is
not a concern; the edge/mesh computing are getting traction now a
dayGeode also supports early-ack option which gives better throughput
compare to distributed-ack ; considering that the distributed-no-ack could
be deprecated.

-Anil.


On Thu, Aug 29, 2019 at 1:27 PM Darrel Schneider 
wrote:

> Geode currently allows creating non-partitioned regions with a
> DISTRIBUTED_NO_ACK Scope. This causes distributed operations on the region
> not to wait for acknowledgement but to just send the op and assume the
> remote member received it.
> Currently gfsh gives you no way to create a region DISTRIBUTED_NO_ACK. It
> always uses DISTRIBUTED_ACK.
> Partition regions do not support no-ack.
> Does anyone know of a reason why it should not be deprecated so that it can
> be removed in some future release?
>


Re: [DISCUSS] Improvements on client function execution API

2019-08-21 Thread Anilkumar Gingade
Just to be clear between java and native-client api:

- Read timeout in function execution Java client API - This is to change
the java client behavior

And following are the native client problems and solution applies to
native-client?
- Timeout in ResultCollector::getResult() and Execution::execute() blocking

-Anil.


On Wed, Aug 21, 2019 at 8:49 AM Alberto Gomez 
wrote:

> Hi,
>
> I have just added the following proposal in the wiki for discussion and
> would like to get feedback from the community.
>
>
> https://cwiki.apache.org/confluence/display/GEODE/%5BDiscussion%5D+Improvements+on+client+Function+execution+API
>
> Problem
>
> The client API for function execution is inconsistent in the following
> aspects:
>
> 1.Read timeout in function execution client API
>
> The client API for Function execution allows to set a timeout to wait for
> the execution of a function.
>
> Setting this timeout has the effect of setting a read timeout in the
> socket. If the timeout expires during the execution of a function before
> data has been received, the connection is closed and if the number of
> retries is reached, the execute method throws an exception.
>
> Nevertheless, how this timeout is set is not uniform across the different
> clients:
>
>   *   In the native C++ and C# clients, the timeout can be set on a per
> function execution basis by passing the timeout to the Execution::execute()
> method.
>   *   In the Java API, the timeout can only be set globally by a system
> property when starting the client process.
>
> 2. Timeout in ResultCollector::getResult()
>
> Apart from the timeout on the function execution, the client API offers
> the possibility of setting a timeout on the getResult() method of the
> ResultCollector object returned by the Execution::execute() method.
>
> Given that this method is blocking when invoked from a client (until all
> results have been received) then the setting of this timeout has no effect
> at all. In fact, the DefaultResultCollector in the Java API just ignores
> the value of the timeout.
>
> Note that this timeout in the ResultCollector::getResult() method is
> useful when used inside a peer as function invocations are not blocking.
>
> 3. Blocking  Execution::execute()
>
> As mentioned above the Execution::execute() method behavior is different
> when invoked from clients than from peers. When invoked from clients it is
> blocking until all results are received while when invoked from peers it is
> non-blocking and the ResultCollector::getResult() method is used to wait to
> get all the results.
>
> This is not explicit in the documentation of the interface and it has
> already been captured in the following ticket:
> https://issues.apache.org/jira/browse/GEODE-3817
>
> Anti-Goals
>
> -
>
> Solution
>
> In order to make the API more coherent two actions are proposed:
>
> 1. Read timeout in function execution Java client API
>
> Add two new Execution::execute() methods in the Java client to offer the
> possibility to set the timeout for the socket just as it is done in the C++
> and C# clients.
>
> This action is simple to implement and there are no adverse effects
> attached to it.
>
> 2. Timeout in ResultCollector::getResult() and Execution::execute()
> blocking
>
> Regarding the timeout in the ResultCollector::getResult() method problem
> and the blocking/non-blocking confusion for Execution::execute() two
> alternatives are considered:
>
> a) Remove the possibility of setting a timeout on the
> ResultCollector::getResult() method on the client side as with the current
> client implementation it is useless. This could be done by removing the
> method with the timeout parameter from the public API.
>
> It would be advisable to make explicit in the documentation that the
> getResult() method does not wait for results to arrive as that should have
> already been done in the Execution::execute() invocation.
>
> This alternative is very simple and would keep things pretty much as they
> are today.
>
> b) Transform the Execution::execute() method on the client side into a
> non-blocking method.
>
> This alternative is more complex and requires changes in all the clients.
> Apart from that it has implications on the public client API it requires
> moving the exceptions offered currently by the Execution::execute() method
> to the ResultCollector::getResult() and new threads will have to be managed.
>
> An outline of a possible implementation for option b) would be:
>
>   *   Instead of invoking the ServerRegionProxy::executeFunction()
> directly as it is done today, create a Future that invokes this method and
> returns the resultCollector passed as parameter.
>   *   Create a new class (ProxyResultCollector) that will hold a Future
> for a ResultCollector and whose getResult() methods implementation would be
> something like:
>
> return this.future.get().getResult();
>
>   *   After creating the future that invokes
> ServerRegionFunction::executeFunction() create an 

Re: [DISCUSS] what region types to support in the new management rest api

2019-08-20 Thread Anilkumar Gingade
My vote is for supporting all the region type currently supported. As mike
was pointing, we have seen usecases where different regions are used for
specific application needs.



On Tue, Aug 20, 2019 at 5:09 PM Darrel Schneider 
wrote:

> gfsh create region currently does not support "distributed-no-ack" nor
> "global". I did not find in jira a feature request for gfsh to support
> these. So I think it would be safe for the Geode Management REST API to
> also not support those scopes.
>
>
> On Tue, Aug 20, 2019 at 12:10 PM Kirk Lund  wrote:
>
> > Here's my 2cents: The Geode Management REST API should definitely support
> > "group" such that creation of a region may target zero, one, or more
> > groups.
> >
> > On Tue, Aug 20, 2019 at 10:45 AM Darrel Schneider  >
> > wrote:
> >
> > > Is "group" support on the PCC roadmap or is the plan for the members
> of a
> > > cluster to always be uniform?
> > >
> > > On Tue, Aug 20, 2019 at 9:56 AM Jinmei Liao  wrote:
> > >
> > > > So, sound like we still need to support *PROXY types. It's OK to drop
> > > > support for LOCAL* region types in management rest API?
> > > >
> > > > Also, regarding existing region shortcuts, we are also experimenting
> > > using
> > > > different object types to represent different types of region, for
> > > example,
> > > > redundantCopies property should only exists in partition regions.
> > Instead
> > > > of having a flat object that could have a type of any of these values
> > and
> > > > holds all sorts of properties that may/may not make sense for that
> > type,
> > > > should just have a factory method that given these region shortcuts,
> we
> > > > would return a specific region object that's determined by this type?
> > > >
> > > > On Tue, Aug 20, 2019 at 8:15 AM Jens Deppe 
> wrote:
> > > >
> > > > > Currently, when deployed to the cloud (aka PCC) there is no ability
> > > for a
> > > > > user to group members thus it is also not possible to create
> regions
> > > (via
> > > > > gfsh at least) that are separated by groups. Typically one would
> > > create a
> > > > > PROXY region against one group and the PARTITION region against
> > another
> > > > > group. However, without the ability to assign groups, that is not
> > > > possible.
> > > > >
> > > > > --Jens
> > > > >
> > > > > On Tue, Aug 20, 2019 at 7:46 AM Michael Stolz 
> > > wrote:
> > > > >
> > > > > > I know that lots of folks use PROXY regions on the server side to
> > > host
> > > > > > logic associated with the region, but I think they always do that
> > in
> > > > > > conjunction with server groups so that the proxy is on some of
> the
> > > > server
> > > > > > and the same region containing data is on others. Given the way
> > > > cache.xml
> > > > > > works they might not even bother with the server groups, but I'm
> > not
> > > > > sure.
> > > > > >
> > > > > > I think we should carry forward the existing shortcuts and not go
> > > > > backward
> > > > > > to the separate attributes.
> > > > > >
> > > > > > --
> > > > > > Mike Stolz
> > > > > > Principal Engineer, Pivotal Cloud Cache
> > > > > > Mobile: +1-631-835-4771
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Aug 19, 2019 at 7:59 PM Darrel Schneider <
> > > > dschnei...@pivotal.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Keep in mind that the context of the regions in question is the
> > > > > cluster.
> > > > > > So
> > > > > > > these regions would be created on servers.
> > > > > > > So, for example, does anyone see a need to create PROXY regions
> > on
> > > > the
> > > > > > > server? Even if we did not support them on the server, they
> would
> > > > still
> > > > > > be
> > > > > > > supported on clients.
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Aug 19, 2019 at 4:26 PM Jinmei Liao  >
> > > > wrote:
> > > > > > >
> > > > > > > > Region type (in another word Region shortcut) defines a set
> of
> > > > > > attributes
> > > > > > > > for a region. These are the list of region types we have:
> > > > > > > >
> > > > > > > > LOCAL,
> > > > > > > > LOCAL_PERSISTENT,
> > > > > > > > LOCAL_HEAP_LRU,
> > > > > > > > LOCAL_OVERFLOW,
> > > > > > > > LOCAL_PERSISTENT_OVERFLOW,
> > > > > > > >
> > > > > > > > PARTITION,
> > > > > > > > PARTITION_REDUNDANT,
> > > > > > > > PARTITION_PERSISTENT,
> > > > > > > > PARTITION_REDUNDANT_PERSISTENT,
> > > > > > > > PARTITION_OVERFLOW,
> > > > > > > > PARTITION_REDUNDANT_OVERFLOW,
> > > > > > > > PARTITION_PERSISTENT_OVERFLOW,
> > > > > > > > PARTITION_REDUNDANT_PERSISTENT_OVERFLOW,
> > > > > > > > PARTITION_HEAP_LRU,
> > > > > > > > PARTITION_REDUNDANT_HEAP_LRU,
> > > > > > > >
> > > > > > > > REPLICATE,
> > > > > > > > REPLICATE_PERSISTENT,
> > > > > > > > REPLICATE_OVERFLOW,
> > > > > > > > REPLICATE_PERSISTENT_OVERFLOW,
> > > > > > > > REPLICATE_HEAP_LRU,
> > > > > > > >
> > > > > > > > REPLICATE_PROXY,
> > > > > > > > PARTITION_PROXY,
> > > > > > > > PARTITION_PROXY_REDUNDANT,
> > > > > > > >
> > > > > > > > In region management rest api, 

Re: [DISCUSS] Controlling event dispatch to AsyncEventListener (review by Aug 22)

2019-08-20 Thread Anilkumar Gingade
Thanks for all the great feedback and comments.

*API Name change:*
*Suggestion: *startPaused*,*setManualStart*,
*startWithEventDispatcherPaused*?, createPaused()*

*Start/Stop behavior:*


*- Manual start has caused a lot of trouble over the years.- Explain
starting AEQ in a paused state is different from creating gateway senders
with manual start*

Yes, we can change/adopt the name which is meaningful with the
functionality. The name suggested "pauseEventDispatchToListener()" is to
make its usage/action clear; and address any ambiguity with its usage;
between adding/removing event from AEQ and dispatching events to AEQ
listener.

To emphasis, this is not same as GatewaySender manual start (start/stop);
the manual start/stop is with enqueuing and dequeuing events from
GatewaySender itself, and as Mike pointed out there are issues with this
(during recovery with parallel gateway) and the reason its been deprecated.

The new functionality is similar to the "pause" and "resume" operation on
the GatewaySender. Except that here with the new api, the AEQ is created
with pause state.

The new api doesn't control adding and removing event from AEQ. Its to
control dispatching event to the listener. When created in paused state,
the events are continued be added into the AEQ and removed from it
(expiry). The new API will allow applications to create/manage any required
state/resource for the events before processing those events in the
application code.

*Cache level setting:*
*- Will it be more feasible if we can set the flag at cache level.*

The cache level configuration affects all the AEQs, which may not be the
requirement. Having at AEQ level help the application to use this
capability only at the required AEQ, gives more controlling capability.

-Anil.







On Tue, Aug 20, 2019 at 2:01 PM Nabarun Nag  wrote:

> Hi Anil,
>
> Will it be possible to explain to the community how the starting AEQ in a
> paused state is different from creating gateway senders with manual start
> set to true. It may be of concern as  'manual start'  for gateways is a
> deprecated.
>
> Just thinking out loud, will it be more feasible if we can set the flag at
> cache level. Any framework that is starting up Apache Geode (E.g: Spring) ,
> creates the cache -> cache.pauseProcessing(); -> create regions -> create
> AEQs -> cache.unpauseProcessing()
>
> We can gate the processing of all event listener at dispatchBatch().
>
> The advantage I feel is that
>  - we avoid introducing a new API to the AEQ creation factory.
>  - if we created 100 AEQs in paused state then we avoid having to have 100
> AEQ.unpause calls.
>
>
> Regards
> Naba
>
>
> On Tue, Aug 20, 2019 at 9:07 AM Michael Stolz  wrote:
>
> > Manual start has caused a lot of trouble over the years. We should
> > definitely circle back on those issues before traveling very far down
> this
> > road.
> >
> > --
> > Mike Stolz
> > Principal Engineer, Pivotal Cloud Cache
> > Mobile: +1-631-835-4771
> >
> >
> >
> > On Tue, Aug 20, 2019 at 11:56 AM Juan José Ramos 
> > wrote:
> >
> > > Hello Anil,
> > >
> > > +1 for the proposed solution.
> > > I'd change the method name from *pauseEventDispatchToListener* to
> > something
> > > more meaningful and understandable for our users, maybe *startPaused*?,
> > > *setManualStart* (as we currently have for the
> *GatewaySenderFactory*)?,
> > > *startWithEventDispatcherPaused*?.
> > > Best regards.
> > >
> > >
> > >
> > > On Sat, Aug 17, 2019 at 12:55 AM Anilkumar Gingade <
> aging...@pivotal.io>
> > > wrote:
> > >
> > > > I have updated the wiki based on Dan's comment.
> > > > Changes with api:
> > > >
> > > > *On "AsyncEventQueueFactory" interface - *
> > > >
> > > > *AsyncEventQueueFactory pauseEventDispatchToListener();  *// This
> > causes
> > > > AEQ to be created with paused state.
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Aug 16, 2019 at 4:36 PM Anilkumar Gingade <
> aging...@pivotal.io
> > >
> > > > wrote:
> > > >
> > > > > Dan,
> > > > >
> > > > > If you look into the API; the AEQ will be created with the pause
> > state.
> > > > > The user (application) has to call resume to dispatch the events.
> > > > >
> > > > > It will be slightly different from GatewaySender behavior; where
> > > > > GatewaySender will be created with run mode and then application
> has
> > to
>

Re: [DISCUSS] Controlling event dispatch to AsyncEventListener (review by Aug 22)

2019-08-16 Thread Anilkumar Gingade
I have updated the wiki based on Dan's comment.
Changes with api:

*On "AsyncEventQueueFactory" interface - *

*AsyncEventQueueFactory pauseEventDispatchToListener();  *// This causes
AEQ to be created with paused state.




On Fri, Aug 16, 2019 at 4:36 PM Anilkumar Gingade 
wrote:

> Dan,
>
> If you look into the API; the AEQ will be created with the pause state.
> The user (application) has to call resume to dispatch the events.
>
> It will be slightly different from GatewaySender behavior; where
> GatewaySender will be created with run mode and then application has to
> call pause on it. Here in this case AEQ will be created with paused state.
>
> -Anil.
>
>
> On Fri, Aug 16, 2019 at 4:31 PM Dan Smith  wrote:
>
>> Hi Anil,
>>
>> While I like the idea of matching the API of GatewaySender, I'm not sure I
>> see how this solves the problem. Is it required of the user to call pause
>> on the AsyncEventQueue as soon as it is created? How would someone do that
>> when creating AEQs with xml or cluster configuration? Maybe it would be
>> better to not dispatch any events until we are done creating all regions?
>>
>> -Dan
>>
>> On Fri, Aug 16, 2019 at 2:31 PM Anilkumar Gingade 
>> wrote:
>>
>> > Proposal to support controlling capability with event dispatch to
>> > AsyncEventQueue Listener.
>> >
>> > Wiki proposal page:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/GEODE/%5BDraft%5D+Controlling+event+dispatch+to+AsyncEventListener
>> >
>> > Here is the details from the wiki page:
>> > *Problem*
>> >
>> > *The Geode system requires AEQs to be configured before regions are
>> > created. If an AEQ listener is operating on a secondary region, this
>> could
>> > cause listener to operate on a region which is not yet created or fully
>> > initialized (for region with co-located regions) which could result in
>> > missing events or dead-lock scenario between region (co-located region)
>> > creation threads. This scenario is likely to happen during persistence
>> > recovery; when AEQs are created in the start, the recovered AEQ events
>> are
>> > dispatched immediately, thus invoking the AEQ listeners.*
>> > Anti-Goals
>> >
>> > None
>> > *Solution*
>> >
>> > *The proposed solution is to provide a way to control dispatching AEQ
>> > events to the AEQ Listeners, this could be done by adding "pause"  and
>> > "resume" capability to the AEQ, which will allow application to decide
>> when
>> > to dispatch events to the listeners. *
>> >
>> >
>> > *The proposal is similar to existing "pause" and "resume" behavior on
>> the
>> > GatewaySender, on which the AEQ is based on (AEQ implementation is a
>> > wrapper around GatewaySender). *
>> > Changes and Additions to Public Interfaces
>> >
>> > *The proposed APIs are:*
>> >
>> > *On "AsyncEventQueueFactory" interface - *
>> >
>> > *AsyncEventQueue pauseEventDispatchToListener();*
>> >
>> > *On "AsyncEventQueue" interface -*
>> >
>> > *boolean resumeEventDispatchToListener(); **returns true or false if the
>> > event dispatch is resumed successfully.*
>> >
>> >
>> > *The constraints on the pauseEventDispatchToListener() will remain
>> similar
>> > to as in "GatewaySender.pause()" :*
>> >
>> > "It should be kept in mind that the events will still be getting queued
>> > into the queue. The scope of this operation is the VM on which it is
>> > invoked. In case the AEQ is parallel, the AEQ will be paused on
>> individual
>> > node where this API is called and the AEQ on other VM's can still
>> dispatch
>> > events. In case the AEQ is not parallel, and the running AEQ on which
>> this
>> > API is invoked is not primary then primary AEQ will still continue
>> > dispatching events."
>> > Performance Impact
>> >
>> >
>> > *This will have similar performance and resource implication as with the
>> > "GatewaySender.pause()" functionality. If the AEQ is not resumed or
>> kept in
>> > "pause" state for long, it may start consuming the configured memory and
>> > overflow it into disk and may cause disk full scenario.*
>> > Backwards Compatibility and Upgrade Path
>> >
>> > *Impact with rolling upgrade

Re: [DISCUSS] Controlling event dispatch to AsyncEventListener (review by Aug 22)

2019-08-16 Thread Anilkumar Gingade
Dan,

If you look into the API; the AEQ will be created with the pause state. The
user (application) has to call resume to dispatch the events.

It will be slightly different from GatewaySender behavior; where
GatewaySender will be created with run mode and then application has to
call pause on it. Here in this case AEQ will be created with paused state.

-Anil.


On Fri, Aug 16, 2019 at 4:31 PM Dan Smith  wrote:

> Hi Anil,
>
> While I like the idea of matching the API of GatewaySender, I'm not sure I
> see how this solves the problem. Is it required of the user to call pause
> on the AsyncEventQueue as soon as it is created? How would someone do that
> when creating AEQs with xml or cluster configuration? Maybe it would be
> better to not dispatch any events until we are done creating all regions?
>
> -Dan
>
> On Fri, Aug 16, 2019 at 2:31 PM Anilkumar Gingade 
> wrote:
>
> > Proposal to support controlling capability with event dispatch to
> > AsyncEventQueue Listener.
> >
> > Wiki proposal page:
> >
> >
> https://cwiki.apache.org/confluence/display/GEODE/%5BDraft%5D+Controlling+event+dispatch+to+AsyncEventListener
> >
> > Here is the details from the wiki page:
> > *Problem*
> >
> > *The Geode system requires AEQs to be configured before regions are
> > created. If an AEQ listener is operating on a secondary region, this
> could
> > cause listener to operate on a region which is not yet created or fully
> > initialized (for region with co-located regions) which could result in
> > missing events or dead-lock scenario between region (co-located region)
> > creation threads. This scenario is likely to happen during persistence
> > recovery; when AEQs are created in the start, the recovered AEQ events
> are
> > dispatched immediately, thus invoking the AEQ listeners.*
> > Anti-Goals
> >
> > None
> > *Solution*
> >
> > *The proposed solution is to provide a way to control dispatching AEQ
> > events to the AEQ Listeners, this could be done by adding "pause"  and
> > "resume" capability to the AEQ, which will allow application to decide
> when
> > to dispatch events to the listeners. *
> >
> >
> > *The proposal is similar to existing "pause" and "resume" behavior on the
> > GatewaySender, on which the AEQ is based on (AEQ implementation is a
> > wrapper around GatewaySender). *
> > Changes and Additions to Public Interfaces
> >
> > *The proposed APIs are:*
> >
> > *On "AsyncEventQueueFactory" interface - *
> >
> > *AsyncEventQueue pauseEventDispatchToListener();*
> >
> > *On "AsyncEventQueue" interface -*
> >
> > *boolean resumeEventDispatchToListener(); **returns true or false if the
> > event dispatch is resumed successfully.*
> >
> >
> > *The constraints on the pauseEventDispatchToListener() will remain
> similar
> > to as in "GatewaySender.pause()" :*
> >
> > "It should be kept in mind that the events will still be getting queued
> > into the queue. The scope of this operation is the VM on which it is
> > invoked. In case the AEQ is parallel, the AEQ will be paused on
> individual
> > node where this API is called and the AEQ on other VM's can still
> dispatch
> > events. In case the AEQ is not parallel, and the running AEQ on which
> this
> > API is invoked is not primary then primary AEQ will still continue
> > dispatching events."
> > Performance Impact
> >
> >
> > *This will have similar performance and resource implication as with the
> > "GatewaySender.pause()" functionality. If the AEQ is not resumed or kept
> in
> > "pause" state for long, it may start consuming the configured memory and
> > overflow it into disk and may cause disk full scenario.*
> > Backwards Compatibility and Upgrade Path
> >
> > *Impact with rolling upgrade: *
> >
> > *As the api is applicable at individual VM level, there is no message
> > serialization changes involved. And only applicable to the events getting
> > dispatched to the listeners on that VM. And the AEQ which are replicated
> > (for redundancy) continues to work as before.*
> >
> > *Backward compatibility requirements: *
> >
> > *None. The AEQs are configured and managed at the server side. There is
> no
> > messaging involved between client/server.*
> >
> > *Disk formatting changes:*
> >
> > *None.*
> >
> > *Deprecation and Application Changes:*
> >
> >
> > *None. If needed, the existing application can be modified to control
> event
> > dispatch with AEQ listener.*
> > Prior Art
> >
> > *Without this, the AEQ listeners operating on other regions could
> > experience missing events or dead lock, if there are co-located regions.*
> >
> > *This approach is simple and can take advantage of the existing
> > functionality that is already supported in GatewaySender on which AEQ is
> > based on.*
> >
>


Re: Propose fix for 1.10 release: Export offline data command failed with EntryDestroyedException

2019-08-16 Thread Anilkumar Gingade
+1 to include

On Fri, Aug 16, 2019 at 2:41 PM Anthony Baker  wrote:

> +1 from me.  When you need to do an offline export, it’s usually
> important.  Not being able to export *all* the data might lead to data loss.
>
> Anthony
>
>
> > On Aug 16, 2019, at 2:06 PM, Udo Kohlmeyer  wrote:
> >
> > +1 to include
> >
> >
> > On 8/16/19 12:43 PM, Eric Shu wrote:
> >> Hi,
> >>
> >> I'd like to include the following commit (
> >> https://gitbox.apache.org/repos/asf?p=geode.git;h=aa33060) into Geode
> 1.10
> >> release.
> >>
> >> The commit fixes the issue that a user tries to use export offline data
> to
> >> a snapshot file but failed. This issue exists from release 1.1.0.
> However,
> >> it is a critical issue, as it prevents users to get the data from their
> >> backup disk stores.
> >>
> >> Regards,
> >> Eric
> >>
>
>


[DISCUSS] Controlling event dispatch to AsyncEventListener (review by Aug 22)

2019-08-16 Thread Anilkumar Gingade
Proposal to support controlling capability with event dispatch to
AsyncEventQueue Listener.

Wiki proposal page:
https://cwiki.apache.org/confluence/display/GEODE/%5BDraft%5D+Controlling+event+dispatch+to+AsyncEventListener

Here is the details from the wiki page:
*Problem*

*The Geode system requires AEQs to be configured before regions are
created. If an AEQ listener is operating on a secondary region, this could
cause listener to operate on a region which is not yet created or fully
initialized (for region with co-located regions) which could result in
missing events or dead-lock scenario between region (co-located region)
creation threads. This scenario is likely to happen during persistence
recovery; when AEQs are created in the start, the recovered AEQ events are
dispatched immediately, thus invoking the AEQ listeners.*
Anti-Goals

None
*Solution*

*The proposed solution is to provide a way to control dispatching AEQ
events to the AEQ Listeners, this could be done by adding "pause"  and
"resume" capability to the AEQ, which will allow application to decide when
to dispatch events to the listeners. *


*The proposal is similar to existing "pause" and "resume" behavior on the
GatewaySender, on which the AEQ is based on (AEQ implementation is a
wrapper around GatewaySender). *
Changes and Additions to Public Interfaces

*The proposed APIs are:*

*On "AsyncEventQueueFactory" interface - *

*AsyncEventQueue pauseEventDispatchToListener();*

*On "AsyncEventQueue" interface -*

*boolean resumeEventDispatchToListener(); **returns true or false if the
event dispatch is resumed successfully.*


*The constraints on the pauseEventDispatchToListener() will remain similar
to as in "GatewaySender.pause()" :*

"It should be kept in mind that the events will still be getting queued
into the queue. The scope of this operation is the VM on which it is
invoked. In case the AEQ is parallel, the AEQ will be paused on individual
node where this API is called and the AEQ on other VM's can still dispatch
events. In case the AEQ is not parallel, and the running AEQ on which this
API is invoked is not primary then primary AEQ will still continue
dispatching events."
Performance Impact


*This will have similar performance and resource implication as with the
"GatewaySender.pause()" functionality. If the AEQ is not resumed or kept in
"pause" state for long, it may start consuming the configured memory and
overflow it into disk and may cause disk full scenario.*
Backwards Compatibility and Upgrade Path

*Impact with rolling upgrade: *

*As the api is applicable at individual VM level, there is no message
serialization changes involved. And only applicable to the events getting
dispatched to the listeners on that VM. And the AEQ which are replicated
(for redundancy) continues to work as before.*

*Backward compatibility requirements: *

*None. The AEQs are configured and managed at the server side. There is no
messaging involved between client/server.*

*Disk formatting changes:*

*None.*

*Deprecation and Application Changes:*


*None. If needed, the existing application can be modified to control event
dispatch with AEQ listener.*
Prior Art

*Without this, the AEQ listeners operating on other regions could
experience missing events or dead lock, if there are co-located regions.*

*This approach is simple and can take advantage of the existing
functionality that is already supported in GatewaySender on which AEQ is
based on.*


Re: Changing external methods to no longer throw UnsupportedOperationException

2019-05-23 Thread Anilkumar Gingade
I agree, this may not look like the usecase that one would be using or
depending. Going with the backward compatibility requirement this will be
breaking that contract.
Again, based on the scenario and usecases, there could be exceptions. I am
trying to see if the versioning support that's used to keep the backward
compatibility contract can be used here.

On Thu, May 23, 2019 at 10:17 AM Jacob Barrett  wrote:

> But what application is going to legitimately call this method and expect
> that it throw an exception? What would be the function of that usage?
>
> If you assume that calling this method under these conditions had no value
> and would therefor never have been called then one could argue that
> implementing this method is adding a feature. It adds a case where one
> could legitimately call this method under new conditions.
>
> > On May 23, 2019, at 10:06 AM, Anilkumar Gingade 
> wrote:
> >
> > As this changes the behavior on the existing older application; it seems
> to
> > break the backward compatibility requirements.
> > We use client versions to keep the contracts/behavior same for older
> > client; can we do the same here.
> >
> > -Anil.
> >
> >
> > On Thu, May 23, 2019 at 8:33 AM Darrel Schneider 
> > wrote:
> >
> >> Is it okay, in a minor release, to implement Region.getStatistics for
> >> partitioned regions? See GEODE-2685. The current behavior is for it to
> >> always throw UnsupportedOperationException. I doubt that any
> application is
> >> depending on that behavior but it could be. I know we have seen changes
> >> like this in the past break the tests of other products that are
> layered on
> >> top of Geode.
> >> Should this type of change be considered one that breaks backwards
> >> compatibility?
> >>
>
>


Re: Changing external methods to no longer throw UnsupportedOperationException

2019-05-23 Thread Anilkumar Gingade
As this changes the behavior on the existing older application; it seems to
break the backward compatibility requirements.
We use client versions to keep the contracts/behavior same for older
client; can we do the same here.

-Anil.


On Thu, May 23, 2019 at 8:33 AM Darrel Schneider 
wrote:

> Is it okay, in a minor release, to implement Region.getStatistics for
> partitioned regions? See GEODE-2685. The current behavior is for it to
> always throw UnsupportedOperationException. I doubt that any application is
> depending on that behavior but it could be. I know we have seen changes
> like this in the past break the tests of other products that are layered on
> top of Geode.
> Should this type of change be considered one that breaks backwards
> compatibility?
>


Re: [DISCUSS] reduce PR checks to JDK11 only

2019-05-16 Thread Anilkumar Gingade
Make sense to me...Looking at the probability of breaking specific to, jdk8
and jdk11 through a commit.


On Wed, May 15, 2019 at 6:09 PM Owen Nichols  wrote:

> Currently every PR commit triggers both JDK8 and JDK11 versions of each
> test job.  I propose that we can eliminate the JDK8 version of each check.
> In the extremely rare case where a code change breaks on Java 8 but works
> fine on Java 11, it would still be caught by the main pipeline (just as
> Windows failures are caught only in the main pipeline).
>
> The only tangible effect today of running both JDK8 and JDK11 tests in PR
> pipeline is twice the chance to encounter possible flaky failures (usually
> unrelated to the commit itself).
>
>
>


Re: [DISCUSS] is it time to make Windows tests gating?

2019-05-16 Thread Anilkumar Gingade
>> around 5 hours, vs 2 hours for Linux tests).
May be a good time to look at reducing/optimizing this.


On Thu, May 16, 2019 at 9:57 AM Ernest Burghardt 
wrote:

> Yes make them gating.
> Run them every commit, Windows is a supported platform.
> Red boxes get attention and Red boxes get fixed.
>
> EB
>
> On Thu, May 16, 2019 at 1:09 AM Udo Kohlmeyer  wrote:
>
> > I think we need to make sure our windows tests get to green... If we
> > make them gating then we will never release, but at the time be
> > motivated to fix them, in order to release.
> >
> > Maybe they run once every day... to at least start getting an idea of
> > health
> >
> > On 5/15/19 18:28, Owen Nichols wrote:
> > > For a very long time we’ve had Windows tests in the main pipeline
> > (hidden away, not in the default view), but the pipeline proceeds to
> > publish regardless of whether Windows tests fail or even run at all.
> > >
> > > Now seems like a good time to review whether to:
> > > a) treat Windows tests as first-class tests and prevent the pipeline
> > from proceeding if any test fails on Windows
> > > b) keep as-is
> > > c) change Windows tests to trigger only once a week rather than on
> every
> > commit, if they are going to remain "informational only"
> > >
> > > One disadvantage to making Windows tests gating is that they currently
> > take much longer to run (around 5 hours, vs 2 hours for Linux tests).
> >
>


Re: Pulse - Non-Embedded Mode

2019-05-01 Thread Anilkumar Gingade
We should be supporting non-embedded mode; I believe most of the app-server
based use cases will be doing this. This also reduces the resource usage on
the geode cluster.



On Wed, May 1, 2019 at 10:44 AM Dan Smith  wrote:

> Option 2 does sound like a good way to go. It does seem like if you are
> making changes to fix non-embedded mode, you probably need to add an
> acceptance test for that mode since there is non already, regardless of
> whether you deprecate non-embedded mode.
>
> I have no issues with deprecating either embedded or non-embedded mode. I
> don't think we've put a lot of energy into pulse recently.
>
> -Dan
>
>
> On Tue, Apr 30, 2019 at 2:12 PM Jens Deppe  wrote:
>
> > More accurately, I think geode-core is only required when TLS is enabled
> on
> > the locator and Pulse needs to make JMX/RMI calls over TLS.
> >
> > I would vote for option 2 in this scenario.
> >
> > --Jens
> >
> > On Tue, Apr 30, 2019 at 1:44 PM Jinmei Liao  wrote:
> >
> > > I believe to run pulse in non-embedded mode, you just need to install
> the
> > > war in a web server and some configuration changes, you don't need
> > > geode-core at all.
> > >
> > > We do lack the acceptance test to run pulse in non-embedded mode
> though.
> > We
> > > have a few unit tests that touches some aspect of it.
> > >
> > > On Tue, Apr 30, 2019 at 12:10 PM Michael Oleske 
> > > wrote:
> > >
> > > > Hi Geode Community!
> > > >
> > > > Some colleagues and I were looking at GEODE-6683 (
> > > > https://issues.apache.org/jira/browse/GEODE-6683) and noticed that
> we
> > do
> > > > not have test coverage for running Pulse in non-embedded mode.  We
> were
> > > > wondering what our strategy is around Pulse in non-embedded mode. In
> > > order
> > > > to fully fix the issue, we would prefer to have a high-level
> acceptance
> > > > test that actually tries to run Pule in non-embedded mode (we could
> not
> > > > find an existing acceptance test that performs this).   However, this
> > > > non-embedded mode seems a bit odd, as the instructions for it (
> > > >
> > > >
> > >
> >
> https://geode.apache.org/docs/guide/19/tools_modules/pulse/pulse-hosted.html
> > > > )
> > > > are slightly confusing and need some updating for geode (such as
> making
> > > > sure geode-core is on the class path). It seems strange to try and
> > host a
> > > > web app in this way, especially with the extra configuration needed
> > > (cannot
> > > > just plop the Pulse war file in my web server with some config and
> have
> > > it
> > > > work).  So there's some questions about the best path forward.
> > > >
> > > > 1.  Should we continue supporting non-embedded mode for Pulse?  It
> > seems
> > > > like it may be useful to allow Pulse to run outside of a member, but
> > not
> > > as
> > > > it currently does.  If it was deprecated, I wouldn't be as insistent
> on
> > > an
> > > > acceptance test for it.
> > > >
> > > > 2.  Should we try to make a separate artifact that is intended to be
> > > > deployed on a web server?  This would have a new artifact that could
> > run
> > > > elsewhere then (with maybe a user provided config file for
> properties.)
> > > >
> > > > 3.  For the issue that brought up these questions (GEODE-6683), we
> have
> > > > currently only written some unit tests to add the properties. So the
> > > > current question is what type of path forward should we take?
> > > >
> > > >
> > > > -michael
> > > >
> > >
> > >
> > > --
> > > Cheers
> > >
> > > Jinmei
> > >
> >
>


Re: [DISCUSS] TTL setting on WAN

2019-03-26 Thread Anilkumar Gingade
Yes. From our experiment that looked like a possibility.

-Anil.


On Tue, Mar 26, 2019 at 9:59 AM Dan Smith  wrote:

> Following up on the conflation thing - Anil and I did an experiment.
> Conflation definitely *does* happen on everything in the queue, not just
> the last batch. But we didn't see destroys get conflated with updates.
>
> So one thing that might make this use case work is to conflate the destroys
> with the updates. Then the disk space would be freed up when the expiration
> events are conflated in the queue.
>
> -Dan
>
> On Tue, Mar 26, 2019 at 8:19 AM Bruce Schuchardt 
> wrote:
>
> > I've been thinking along those lines as well Suranjan.  Since conflation
> > and expiry-forwarding don't solve the problem of running out of disk
> > space the solution needs to involve the dispatch thread.
> >
> > For the session-state caching scenario that raised this whole issue I
> > think what you've described will work.  Looking at it with a wider lens
> > I'm a little concerned about a TTL on the queue because multiple regions
> > can feed into the same queue and you might not have the same TTL
> > settings on all of those regions.
> >
> > On 3/25/19 4:53 PM, Suranjan Kumar wrote:
> > > Hi,
> > >   I think the one approach for a user would be to 'filter' the events
> > while
> > > dispatching. If I remember correctly, we can attach a filter at
> dispatch
> > > time and filter the events based on creationTime of the GatewayEvent.
> We
> > > can provide a pre created filter and use it based on some so that user
> > > doesn't have to write his/her own.
> > >
> > > Something like,
> > > /**
> > > All the events which spend timeToLive or more time in queue will be
> > deleted
> > > from the queue
> > > and will not be sent to remote site.
> > > Possible consequence is that two sites can be inconsistent in case
> > > */
> > > public GatewaySenderFactory setEntryTimeToLive(long timeToLive);
> > >
> > > As queues will be read in LRU way this would be faster too. Only
> drawback
> > > is that there will be only one thread (not sure if we have concurrent
> > > dispatcher yet) clearing the queue.
> > >
> > > As Udo/Dan mentioned above, user needs to be aware of the consequences.
> > >
> > >
> > > On Tue, Mar 26, 2019 at 3:09 AM Bruce Schuchardt <
> bschucha...@pivotal.io
> > >
> > > wrote:
> > >
> > >> I've walked through the code to forward expiration actions to async
> > >> event listeners & don't see how to apply it to removal of queue
> entries
> > >> for WAN.  The current implementation just queues the expiration
> > >> actions.  If we wanted to remove any queued events associated with the
> > >> expired region entry we'd have to scan the whole queue, which would be
> > >> too slow if we're overflowing the queue to disk.
> > >>
> > >> I've also walked through the conflation code.  It applies only to the
> > >> current batch being processed by the gateway sender.  The data
> structure
> > >> used to perform conflation is just a Map that is created in the
> sender's
> > >> batch processing method and then thrown away.
> > >>
> > >> On 3/20/19 11:15 AM, Dan Smith wrote:
> >  2) The developer wants to replicate _state_.  This means that
> implicit
> >  state changes (expiration or eviction w/ destroy) could allow us to
> >  optimize the queue size.  This is very similar to conflation, just a
> >  different kind of optimization.
> > 
> >  For this second case, does it make sense to allow the user to
> specify
> > a
> >  different TTL than the underlying region?  It seems like what the
> user
> >  wants is to not replicate stale data and having an extra TTL
> attribute
> >  would just be another value to mis-configure.  What do you think
> about
> > >> just
> >  providing a boolean flag?
> > 
> > 
> > >>> This kinda jogged my memory. AsyncEventQueues actually *do* have a
> > >> boolean
> > >>> flag to allow you to forward expiration events to the queue. I have
> no
> > >> idea
> > >>> how this interacts with conflation though -
> > >>>
> > >>
> >
> https://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/asyncqueue/AsyncEventQueueFactory.html#setForwardExpirationDestroy-boolean-
> >
>


Re: [DISCUSS] TTL setting on WAN

2019-03-20 Thread Anilkumar Gingade
+1. Will the expiration (destroy) be applied on local queues or the
expiration will be replicated (for both serial and parallel)?

-Anil.


On Wed, Mar 20, 2019 at 8:59 AM Bruce Schuchardt 
wrote:

> We've seen situations where the receiving side of a WAN gateway is slow
> to accept data or is not accepting any data.  This can cause queues to
> fill up on the sending side.  If disk-overflow is being used this can
> even lead to an outage.  Some users are concerned more with the latest
> data and don't really care if old data is thrown away in this
> situation.  They may have set a TTL on their Regions and would like to
> be able to do the same thing with their GatewaySenders.
>
> With that in mind I'd like to add this method to GatewaySenderFactory:
>
> /** * Sets the timeToLive expiration attribute for queue entries for the
> next * {@code GatewaySender} created. * * @param timeToLive the
> timeToLive ExpirationAttributes for entries in this region * @return a
> reference to this GatewaySenderFactory object * @throws
> IllegalArgumentException if timeToLive is null * @see
> RegionFactory#setEntryTimeToLive */ public GatewaySenderFactory
> setEntryTimeToLive(ExpirationAttributes timeToLive);
>
> The exact implementation may not be the same as for Regions since we
> probably want to expire the oldest entries first and make sure we do so
> in their order in the queue.
>
>


Re: [DISCUSS] Proposal to re-cut Geode 1.9.0 release branch

2019-03-19 Thread Anilkumar Gingade
+1 to re-cut.

-Anil.


On Tue, Mar 19, 2019 at 2:11 PM Dick Cavender  wrote:

> +1 to re-cutting the 1.9 release branch off a more stable develop sha
> within the last couple days.
>
> On Tue, Mar 19, 2019 at 1:14 PM Bruce Schuchardt 
> wrote:
>
> > If we recut the release branch we need to update JIRA tickets marked
> > fixed in 1.10
> >
> > On 3/19/19 12:48 PM, Sai Boorlagadda wrote:
> > > > It was known at the time that develop was not as stable as desired,
> > > so we planned to cherry-pick fixes from develop until the release
> > > branch was stable enough to ship.
> > > I want to clarify that we decided to cut the release branch not that
> > > develop was not stable. But really that it is desirable to cut the
> > > branch sooner to avoid any regression risk that can be introduced by
> > > on-going work on develop.
> > >
> > > Nevertheless looks like develop is more stable than release branch due
> > > to some test fixes that were not cherry-picked into the release branch.
> > > I think its a good idea to re-cut the branch as our current position
> > > to stabilize release branch before releasing.
> > >
> > > +1 to re-cut.
> > >
> > > Sai
> > >
> > > On Tue, Mar 19, 2019 at 12:19 PM Owen Nichols  > > > wrote:
> > >
> > > The Geode 1.9.0 release branch was originally cut 4 weeks ago on
> > > Feb 19.  It was known at the time that develop was not as stable
> > > as desired, so we planned to cherry-pick fixes from develop until
> > > the release branch was stable enough to ship.  While this is a
> > > good strategy when starting from a fairly good baseline, it seems
> > > in this case it has only added complexity without leading to
> > > stability.
> > >
> > > Looking at the pipelines over the last week (see attached
> > > metrics), it appears we have been far more successful at
> > > stabilizing /develop/ than /release/1.9.0/. Rather than trying to
> > > cherry-pick more and more fixes to the release branch, I propose
> > > we RE-CUT the 1.9.0 release branch later this week in order to
> > > start from a much more stable baseline.
> > >
> > > -Owen
> > >
> > >
> > >
> >
>


Re: [DISCUSS] Moving redis to a separate module

2019-03-12 Thread Anilkumar Gingade
+1

On Tue, Mar 12, 2019 at 5:32 PM John Blum  wrote:

> Definitely a reasonable change.  Perhaps, for consistency sake, the same
> should be applied to Geode's Memcached support? (in another PR).
>
>
> On Tue, Mar 12, 2019 at 4:23 PM Dan Smith  wrote:
>
> > I created a PR to move our redis support to a separate module. Let me
> know
> > what you think:
> >
> > https://github.com/apache/geode/pull/3284
> >
> > Geode servers will still include redis on the classpath, so the only
> effect
> > of this is that if you are launching a server based on the maven
> > dependencies, you will need geode-core and geode-redis to launch a server
> > with redis.
> >
> > In addition to making it easier to find the redis specific code this also
> > removes 4 dependencies from geode-core.
> >
> > -Dan
> >
>
>
> --
> -John
> john.blum10101 (skype)
>


Re: A small proposal: Not Sorting in AnalyzeSerializablesJUnitTest

2018-11-13 Thread Anilkumar Gingade
If it makes easy to find/address failure with AnalyzeSerializablesTest, +1

-Anil.


On Tue, Nov 13, 2018 at 9:34 AM Kirk Lund  wrote:

> +1 I've had to reorder the list a few times myself to correct the ordering
>
> On Mon, Nov 12, 2018 at 5:28 PM, Galen O'Sullivan 
> wrote:
>
> > Hi all,
> >
> > I wrote a PR (GEODE-5800) recently to remove redundant cases from
> > DataSerializer.readObject etc. calls. This changed the bytecode size (but
> > not the behavior) of a number of DataSerializables, and I realized that
> the
> > task of updating the list (or viewing the diff) was made harder by the
> fact
> > that our sanctionedDataSerializables list has gotten out of order. I
> would
> > like to propose forcing the list (and probably sanctionedSerializables as
> > well) to be ordered and the files to be equal, as I see no benefit to
> > having the files out of order, and I do see a benefit to having
> > configuration files like this rigidly defined so that we can analyze and
> > read diffs better.
> >
> > Thoughts?
> >
> > Thanks,
> > Galen
> >
>


Re: Geode Native & Apache Geode 1.8 Release

2018-10-10 Thread Anilkumar Gingade
Good work team.
+1 to get this as part of Geode 1.8 release.
It will be good to see community taking advantage of this. And building new
native client apps.
I assume it will have all the docs about client-server compatibility
version info. And framework for backward compatibility testing with new
geode releases.

-Anil.



On Wed, Oct 10, 2018 at 12:02 PM Ernest Burghardt 
wrote:

> +1 for a source release
>
>
> On Wed, Oct 10, 2018 at 12:59 PM Anthony Baker  wrote:
>
> > I think starting with a source-only release of the native client is a
> good
> > first step.  That lets us focus on verifying that all the tasks outlined
> in
> > [1] are complete and correct.
> >
> > Anthony
> >
> > [1] https://issues.apache.org/jira/browse/GEODE-1416
> >
> >
> > > On Oct 10, 2018, at 11:52 AM, Dan Smith  wrote:
> > >
> > > That is awesome! Let's get it in!
> > >
> > > I think there are some details to work out:
> > > - Do we need to build any automation for creating the native source
> > > release (similar to ./gradlew srcDist on the java side)?
> > > - Will we release binaries? Which platforms and how to does the release
> > > manager build them?
> > > - How do we verify the NC code - can we create a public pipeline?
> > >
> > > Shipping these native APIs will be a great improvement!
> > >
> > > -Dan
> > >
> > > On Wed, Oct 10, 2018 at 8:41 AM Addison Huddy 
> wrote:
> > >
> > >> Hi,
> > >>
> > >> The Geode Native components (https://github.com/apache/geode-native)
> > have
> > >> made tremendous progress since its original donation to Apache.  The
> > >> project is nearing a release candidate and I propose that the *first
> > >> official release of Geode Native be included in Apache Geode 1.8.*
> > >>
> > >> Since donation, the project has
> > >>
> > >>   - modernized its C++ API based on C++ 11 standards
> > >>   - refactored away the cache singleton to allow for more flexible
> > >>   architectures and client-side data modeling
> > >>   - refactored the serializable interfaces (DataSerializable,
> > >>   PdxSerializable, DataSerializableFixedId) to make object
> serialization
> > >>   more straight-forward
> > >>   - created several examples on how to use the client (
> > >>   https://github.com/apache/geode-native/tree/develop/examples)
> > >>
> > >> In all, the project has closed 285 JIRA tickets since donation.
> > >>
> > >> If you want to learn more about the Geode Native, check out these two
> > >> Apache Geode By Example videos.
> > >>
> > >> .NET: https://www.youtube.com/watch?v=-LQYNJNQ7B4=3s
> > >>
> > >> C++: https://www.youtube.com/watch?v=KJciEcFRdtY=1s
> > >>
> > >> Looking forward to hearing your input on including the first cut of
> > Geode
> > >> Native in Apache Geode 1.8.
> > >>
> > >>
> > >> Best,
> > >> Addison
> > >>
> >
> >
>


Re: [DISCUSS] Predictable minor release cadence

2018-10-04 Thread Anilkumar Gingade
If I remember from earlier discussion; the plan was to deliver a release
once 3 months. But from the past release history we had difficulty in
achieving that, either the features are not completely ready or the
bug-fixes have taken more time. We need verify what is right for Apache
Geode, 3, 4 or 6 months; and there is any community dev/activity that
depends on Geode release.
My vote will be for 4 or 6 months, as it provides at least 3+ month for dev
activity and 1 month for QA.

-Anil.


On Thu, Oct 4, 2018 at 2:43 PM Dan Smith  wrote:

> +1 I definitely like the idea of scheduled releases.
>
> I wonder if cutting the release branch a month ahead of time is overkill,
> but I guess we do seem to keep finding issues after the branch is cut.
>
> -Dan
>
> On Thu, Oct 4, 2018 at 1:25 PM Alexander Murmann 
> wrote:
>
> > Hi everyone,
> >
> > I want to propose shipping Geode on a regular cadence. My concrete
> proposal
> > is to ship Geode every 3 months on the first weekday. To make sure we hit
> > that date we would cut the release 1 months prior to that day.
> >
> > *Why?*
> > Knowing on what day the release will get cut and on what day we ship
> allows
> > community members to plan their contributions. If I want my feature to be
> > in the next release I know by when I need to have it merged to develop
> and
> > can plan accordingly. As a user who is waiting for a particular feature
> or
> > fix that's already on develop, I know when to expect the release that
> > includes this work and again, can plan accordingly.
> >
> > This makes working and using Apache Geode more predictable which makes
> all
> > our lives less stressful. To make this work, it's important to be strict
> > about cutting the release branch on the set date and only allow critical
> > fixes after the release has been cut. Once we start compromising on this,
> > we go down a slippery slope that ultimately leads to not getting the
> > predictability benefits described here.
> >
> > Some other successful Apache projects share similar approaches:
> >
> >- Kafka
> ><
> https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan>
> >releases every 4 months and cuts the release 1 month prior
> >- PredictionIO 
> > releases
> >every 2 months
> >- Spark  does not
> seem
> >to have a hard date, but aims to ship every 6 months, so there is at
> > least
> >a goal date
> >
> >
> > *What?*
> > As stated above, I suggest to release every three months. Given we just
> > shipped the next release would go out on January 2nd. That timing in
> > unfortunate, due to the holidays. Therefore I propose to aim for a
> December
> > 3rd (1st Monday in December) release. In order to meet that date, we
> should
> > cut the release branch on November 1st. That also means that we should
> > start finding a volunteer to manager the release on October 25th. I know
> > this seems really close, given we just shipped, but keep in mind that
> this
> > is to avoid the holidays and that we already have close to a month worth
> of
> > work on develop.
> >
> > *Proposed near future schedule:*
> > October 25th: Find release manager
> > November 1st: Cut 1.8 release branch
> > December 1st: Ship 1.8
> > January 28th: Find release manager
> > February 1st: Cut 1.9 release branch
> > March 1st: Ship 1.9
> > and so on...
> >
> > Thanks for sharing your thoughts and feedback on this!
> >
>


Re: 2 minute gateway startup time due to GEODE-5591

2018-09-04 Thread Anilkumar Gingade
We should fix this for the release.
-Anil.


On Tue, Sep 4, 2018 at 5:09 PM Udo Kohlmeyer  wrote:

> Imo (and I'm coming in cold)... We are NOT officially supporting Alpine
> linux (yet), which is the basis for this ticket, maybe push this to a
> later release?
>
> I prefer us getting out the fixes we have and release a more optimal
> version of GEODE-5591 later.
>
> IF this is a bug that will affect us on EVERY linux distro, then we
> should fix, otherwise, I vote to push it to 1.8
>
> --Udo
>
>
> On 9/4/18 16:38, Dan Smith wrote:
> > Spitting this into a separate thread.
> >
> > I see the issue. The two minute timeout is the constructor for
> > AcceptorImpl, where it retries to bind for 2 minutes.
> >
> > That behavior makes sense for CacheServer.start.
> >
> > But it doesn't make sense for the new logic in GatewayReceiver.start()
> from
> > GEODE-5591. That code is trying to use CacheServer.start to scan for an
> > available port, trying each port in a range. That free port finding logic
> > really doesn't want to have two minutes of retries for each port. It
> seems
> > like we need to rework the fix for GEODE-5591.
> >
> > Does it make sense to hold up the release to rework this fix, or should
> we
> > just revert it? Have we switched concourse over to using alpine linux,
> > which I think was the original motivation for this fix?
> >
> > -Dan
> >
> > On Tue, Sep 4, 2018 at 4:25 PM, Dan Smith  wrote:
> >
> >> Why is it waiting at all in this case? Where is this 2 minute timeout
> >> coming from?
> >>
> >> -Dan
> >>
> >> On Tue, Sep 4, 2018 at 4:12 PM, Sai Boorlagadda <
> sai.boorlaga...@gmail.com
> >>> wrote:
> >>> So the issue is that it takes longer to start than previous releases?
> >>> Also, is this wait time only when using Gfsh to create
> gateway-receiver?
> >>>
> >>> On Tue, Sep 4, 2018 at 4:03 PM Nabarun Nag  wrote:
> >>>
>  Currently we have a minor issue in the release branch as pointed out
> by
>  Barry O.
>  We will wait till a resolution is figured out for this issue.
> 
>  Steps:
>  1. create locator
>  2. start server --name=server1 --server-port=40404
>  3. start server --name=server2 --server-port=40405
>  4. create gateway-receiver --member=server1
>  5. create gateway-receiver --member=server2 `This gets stuck for 2
> >>> minutes`
>  Is the 2 minute wait time acceptable? Should we document it? When we
> >>> revert
>  GEODE-5591, this issue does not happen.
> 
>  Regards
>  Nabarun Nag
> 
>
>


Re: [DISCUSS] Apache Geode 1.7.0 release branch created

2018-09-04 Thread Anilkumar Gingade
Its not gfsh specific. Its in the Gateway receiver start.

It looks like the changes with GEODE-5591 still hit the earlier issue (it
was fixing) if the port is same as the port returned by "getPortToStart()",
that was removed. I may be wrong.

-Anil.


On Tue, Sep 4, 2018 at 4:39 PM Sai Boorlagadda 
wrote:

> So the issue is that it takes longer to start than previous releases?
> Also, is this wait time only when using Gfsh to create gateway-receiver?
>
> On Tue, Sep 4, 2018 at 4:03 PM Nabarun Nag  wrote:
>
> > Currently we have a minor issue in the release branch as pointed out by
> > Barry O.
> > We will wait till a resolution is figured out for this issue.
> >
> > Steps:
> > 1. create locator
> > 2. start server --name=server1 --server-port=40404
> > 3. start server --name=server2 --server-port=40405
> > 4. create gateway-receiver --member=server1
> > 5. create gateway-receiver --member=server2 `This gets stuck for 2
> minutes`
> >
> > Is the 2 minute wait time acceptable? Should we document it? When we
> revert
> > GEODE-5591, this issue does not happen.
> >
> > Regards
> > Nabarun Nag
> >
> > On Tue, Sep 4, 2018 at 10:50 AM Nabarun Nag  wrote:
> >
> > > Status Update on release process for 1.7.0
> > > - checkPom files are being modified to have version as 1.7.0 instead of
> > > 1.8.0-SNAPSHOT
> > > - gradle.properties file has been modified to reflect 1.7.0 as the
> > version.
> > > - Version.java has been reverted to remove all changes corresponding to
> > > 1.8.0
> > > - CommandInitializer.java has been reverted to remove changes for 1.8.0
> > > - LuceneIndexCommandsJUnitTest.java has been modified to change
> > > Version.GEODE_180 to GEODE_170
> > > - LuceneIndexCommands.java has been modified to change
> Version.GEODE_180
> > > to GEODE_170
> > > -TXCommitMessage.java has been modified to change Version.GEODE_180 to
> > > GEODE_170
> > >
> > > I will be getting in touch with the individual developers to verify my
> > > changes.
> > > The branch will be update once we get a green light on these changes.
> > >
> > > Still need updates on these tickets:
> > >
> > > GEODE-5600 - [Patrick Rhomberg]
> > > GEODE-5578 - [Robert Houghton]
> > > GEODE-5492 - [Robert Houghton]
> > > GEODE-5280 - [xiaojian zhou & Biju Kunjummen]
> > >
> > > These tickets have commits into develop but they are still open with
> fix
> > > version as 1.8.0
> > >
> > > Regards
> > > Nabarun Nag
> > >
> > >
> > >
> > > On Fri, Aug 31, 2018 at 3:38 PM Dale Emery  wrote:
> > >
> > >> I have resolved GEODE-5254
> > >>
> > >> Dale
> > >>
> > >> > On Aug 31, 2018, at 3:34 PM, Nabarun Nag  wrote:
> > >> >
> > >> > Requesting status update on the following JIRA tickets. These
> tickets
> > >> have
> > >> > commits into develop against its name but the status is still open /
> > >> > unresolved.
> > >> >
> > >> > GEODE-5600 - [Patrick Rhomberg]
> > >> > GEODE-5578 - [Robert Houghton]
> > >> > GEODE-5492 - [Robert Houghton]
> > >> > GEODE-5280 - [xiaojian zhou & Biju Kunjummen]
> > >> > GEODE-5254 - [Dale Emery]
> > >> >
> > >> > GEODE-4794 - [Sai]
> > >> > GEODE-5594 - [Sai]
> > >> >
> > >> > Regards
> > >> > Nabarun Nag
> > >> >
> > >> >
> > >> > On Fri, Aug 31, 2018 at 3:18 PM Nabarun Nag 
> wrote:
> > >> >
> > >> >>
> > >> >> Please continue using 1.7.0 as a fix version in JIRA till the email
> > >> comes
> > >> >> in that the 1.7.0 release branch has be cut.
> > >> >>
> > >> >> Changing the fixed version for the following tickets to 1.7.0 from
> > >> 1.8.0
> > >> >> as these fixes will be included in the 1.7.0 release
> > >> >>
> > >> >> GEODE-5671
> > >> >> GEODE-5662
> > >> >> GEODE-5660
> > >> >> GEODE-5652
> > >> >>
> > >> >> Regards
> > >> >> Nabarun Nag
> > >> >>
> > >> >>
> > >> >> On Fri, Aug 31, 2018 at 2:20 PM Nabarun Nag 
> wrote:
> > >> >>
> > >> >>> A new feature of get/set cluster config was added as new feature
> to
> > >> gfsh.
> > >> >>> This needs to be added to the documentation.
> > >> >>> Once this is done, the branch will be ready.
> > >> >>>
> > >> >>> Regards
> > >> >>> Nabarun
> > >> >>>
> > >> >>>
> > >> >>> On Fri, Aug 31, 2018 at 2:15 PM Alexander Murmann <
> > >> amurm...@pivotal.io>
> > >> >>> wrote:
> > >> >>>
> > >>  Nabarun, do you still see anything blocking cutting the release
> at
> > >> this
> > >>  point?
> > >> 
> > >>  Maybe we can even get a pipeline going today? 
> > >> 
> > >>  On Fri, Aug 31, 2018 at 10:38 AM, Sai Boorlagadda <
> > >>  sai.boorlaga...@gmail.com
> > >> > wrote:
> > >> 
> > >> > We can go ahead and cut 1.7 with out GEODE-5338 as I don't have
> > the
> > >>  code
> > >> > ready.
> > >> >
> > >> > GEODE-5594, adds a new flag to enable hostname validation and is
> > >>  disabled
> > >> > by default so we are good with changes that are already merged
> and
> > >> > documentation for GEODE-5594 is ready merged.
> > >> >
> > >> > Naba, after the branch is cut we should delete windows jobs from
> > the
> > 

Re: [DISCUSS] Streamline return value from RemoteQuery

2018-08-14 Thread Anilkumar Gingade
In Java, they are separated so that the results can be managed effectively.
For example StructSet has its own implementation to manage the query
results that are structs (more than one projection attributes).

-Anil



On Tue, Aug 14, 2018 at 8:28 AM David Kimura  wrote:

> I have a couple questions:
>
> Do you have an idea or theories of what was the original intent behind
> separating ResultSet and StructSet?
>
> Is execute a blocking or non-blocking call and does the interface have any
> guarantee of that?
>
> Thanks,
> David
>
> On Mon, Aug 13, 2018 at 4:06 PM, Ernest Burghardt 
> wrote:
>
> > Currently, geode-native's query::execute returns a
> > shared_ptr and
> > that pointer can be either ResultSet or StructSet.
> >
> >
> > RemoteQuery::execute contains logical code to determine with QueryResults
> > are greater than 0... We should look at removing this logic and only
> > returning StructSets
> > This allows removal of ResultSet which will streamline the API and
> > associated code...
> >
> > This duality is unnecessary and should be removed.
> > I am proposing that the results only return  StructSet
> >
> > Best,
> > EB
> >
>


  1   2   3   >