Re: About Geode rolling downgrade

2020-04-22 Thread Anilkumar Gingade
That's right, most/always no down-time requirement is managed by having
replicated cluster setups (Disaster-recovery/backup site). The data is
either pushed to both systems through the data ingesters or by using WAN
setup.
The clusters are upgraded one at a time. If there is a failure during
upgrade or needs to be rolled back; one system will be always up
and running.

-Anil.





On Wed, Apr 22, 2020 at 1:51 PM Anthony Baker  wrote:

> Anil, let me see if I understand your perspective by stating it this way:
>
> If cases where 100% uptime is a requirement, users are almost always
> running a disaster recovery site.  It could be active/active or
> active/standby but there are already at least 2 clusters with current
> copies of the data.  If an upgrade goes badly, the clusters can be
> downgraded one at a time without loss of availability.  This is because we
> ensure compatibility across the wan protocol.
>
> Is that correct?
>
>
> Anthony
>
>
>
> > On Apr 22, 2020, at 10:43 AM, Anilkumar Gingade 
> wrote:
> >
> >>> Rolling downgrade is a pretty important requirement for our customers
> >>> I'd love to hear what others think about whether this feature is worth
> > the overhead of making sure downgrades can always work.
> >
> > I/We haven't seen users/customers requesting rolling downgrade as a
> > critical requirement for them; most of the time they had both an old and
> > new setup to upgrade or switch back to an older setup.
> > Considering the amount of work involved, and code complexity it brings
> in;
> > while there are ways to downgrade, it is hard to justify supporting this
> > feature.
> >
> > -Anil.
>
>


Re: About Geode rolling downgrade

2020-04-22 Thread Anthony Baker
Anil, let me see if I understand your perspective by stating it this way:

If cases where 100% uptime is a requirement, users are almost always running a 
disaster recovery site.  It could be active/active or active/standby but there 
are already at least 2 clusters with current copies of the data.  If an upgrade 
goes badly, the clusters can be downgraded one at a time without loss of 
availability.  This is because we ensure compatibility across the wan protocol.

Is that correct?


Anthony



> On Apr 22, 2020, at 10:43 AM, Anilkumar Gingade  wrote:
> 
>>> Rolling downgrade is a pretty important requirement for our customers
>>> I'd love to hear what others think about whether this feature is worth
> the overhead of making sure downgrades can always work.
> 
> I/We haven't seen users/customers requesting rolling downgrade as a
> critical requirement for them; most of the time they had both an old and
> new setup to upgrade or switch back to an older setup.
> Considering the amount of work involved, and code complexity it brings in;
> while there are ways to downgrade, it is hard to justify supporting this
> feature.
> 
> -Anil.



[DISCUSS] preparing for Geode 1.13.0 release

2020-04-22 Thread Owen Nichols
Geode is scheduled to cut support/1.13 on May 4, as per the quarterly release 
schedule approved [1] in 2018 and affirmed in last month’s “Shipping patch 
releases” RFC [2].

Please volunteer if you are interested in serving as Release Manager for Geode 
1.13.0.

[1] 
https://lists.apache.org/thread.html/8626f7cc73b49cc90129ec5f6021adab3815469048787032935bfc1e%40%3Cdev.geode.apache.org%3E
[2] https://cwiki.apache.org/confluence/display/GEODE/Shipping+patch+releases

Re: About Geode rolling downgrade

2020-04-22 Thread Anilkumar Gingade
>> Rolling downgrade is a pretty important requirement for our customers
>> I'd love to hear what others think about whether this feature is worth
the overhead of making sure downgrades can always work.

I/We haven't seen users/customers requesting rolling downgrade as a
critical requirement for them; most of the time they had both an old and
new setup to upgrade or switch back to an older setup.
Considering the amount of work involved, and code complexity it brings in;
while there are ways to downgrade, it is hard to justify supporting this
feature.

-Anil.





On Tue, Apr 21, 2020 at 2:01 PM Dan Smith  wrote:

> > Anyhow, we wonder what would be as of today the recommended or official
> way to downgrade a Geode system without downtime and data loss?
>
> I think the without downtime option is difficult right now. The most bullet
> proof way to downgrade without data loss is probably just to export/import
> the data, but that involves downtime. In many cases, you could restart the
> system with an old version if you have persistent data because the on disk
> format doesn't change that often, but that won't work in all cases. Or if
> you have multiple redundant WAN sites you could potentially shift traffic
> from one to the other and recreate a WAN site, but that also requires some
> work.
>
> > Rolling downgrade is a pretty important requirement for our customers so
> we would not like to close the discussion here and instead try to see if it
> is still reasonable to propose it for Geode maybe relaxing a bit the
> expectations and clarifying some things.
>
> I agree that rolling downgrade is a useful feature for some cases. I also
> agree we would need to add a lot of tests to make sure we really can
> support it. I'd love to hear what others think about whether this feature
> is worth the overhead of making sure downgrades can always work. As Bruce
> pointed out, we have made changes in the past and we will make changes in
> the future that may need additional logic to support downgrades.
>
> Regarding your downgrade steps, they look reasonable. You might consider
> downgrading the servers first. Rolling *upgrade* upgrades the locators
> first, so up to this point we have only tested a newer locator with an
> older server.
>
> -Dan
>
> On Mon, Apr 20, 2020 at 9:13 AM  wrote:
>
> > Hi,
> >
> > I agree that if we wanted to support limited rolling downgrade some other
> > version interchange needs to be done and extra tests will be required.
> >
> > Nevertheless, this could be done using gfsh or with a startup parameter.
> > For example, in the case you mentioned about the UDP messaging, some
> > command like: "enable UDP messaging" to put the system again in a state
> > equivalent to "upgrade in progress but not yet completed" that would
> allow
> > old members to join again.
> > I guess for each case there would be particularities but they should not
> > involve a lot of effort because most of the mechanisms needed (the ones
> > that allow old and new members to coexist) will have been developed for
> the
> > rolling upgrade.
> >
> > Anyhow, we wonder what would be as of today the recommended or official
> > way to downgrade a Geode system without downtime and data loss?
> >
> >
> > 
> > From: Bruce Schuchardt 
> > Sent: Friday, April 17, 2020 11:36 PM
> > To: dev@geode.apache.org 
> > Subject: Re: About Geode rolling downgrade
> >
> > Hi Alberto,
> >
> > I think that if we want to support limited rolling downgrade some other
> > version interchange needs to be done and there need to be tests that
> prove
> > that the downgrade works.  That would let us document which versions are
> > compatible for a downgrade and enforce that no-one attempts it between
> > incompatible versions.
> >
> > For instance, there is work going on right now that introduces
> > communications changes to remove UDP messaging.  Once rolling upgrade
> > completes it will shut down unsecure UDP communications.  At that point
> > there is no way to go back.  If you tried it the old servers would try to
> > communicate with UDP but the new servers would not have UDP sockets open
> > for security reasons.
> >
> > As a side note, clients would all have to be rolled back before starting
> > in on the servers.  Clients aren't equipped to talk to an older version
> > server, and servers will reject the client's attempts to create
> connections.
> >
> > On 4/17/20, 10:14 AM, "Alberto Gomez"  wrote:
> >
> > Hi Bruce,
> >
> > Thanks a lot for your answer. We had not thought about the changes in
> > distributed algorithms when analyzing rolling downgrades.
> >
> > Rolling downgrade is a pretty important requirement for our customers
> > so we would not like to close the discussion here and instead try to see
> if
> > it is still reasonable to propose it for Geode maybe relaxing a bit the
> > expectations and clarifying some things.
> >
> > First, I think supporting rolling downgrade does not 

Re: Data ingestion with predefined buckets

2020-04-22 Thread Anthony Baker
Steve,

Have you looked at grouping your putAll() requests into groups that align to 
Geode’s buckets?  In your application code, you can determine the hash for each 
data item and self-partition the entries.  This allows you to send the requests 
on separate threads in parallel while optimizing network traffic.

I have seen this used for very high-throughput ingest use cases.

Anthony


> On Apr 16, 2020, at 11:09 AM, Anilkumar Gingade  wrote:
> 
>>> PutAllPRMessage.*
> 
> These are internal APIs/message protocols used to handle PartitionedRegin
> messages.
> The messages are sent from originator node to peer nodes to operate on a
> given partitioned region; not intended as application APIs.
> 
> We could consider, looking at the code, which determines bucket-id for each
> of putAll keys. If there is routing info that identifies a common data
> store (bucket); the code could be optimized there...
> 
> My recommendation is still using the existing APIs and trying to tune the
> putAll map size. By reducing the map size, you will be pushing small chunks
> of data to the server, while remaining data is acted upon (at client);
> which keeps both client and server busy at the same time. You can also look
> at tuning socket buffer size, to fit your data size so that the data is
> written/read in a single chunk.
> 
> -Anil
> 
> 
> On Wed, Apr 15, 2020 at 7:01 PM steve mathew 
> wrote:
> 
>> Anil, yes its a kind of custom hash (which involves calculating hash on all
>> fields of row). Have to stick to the predefined mechanism based on which
>> source files are generated.
>> 
>> It would be great help if some-one guide me about any available
>> *server-side
>> internal API that provides bucket level data-ingestion if any*. While
>> exploring came across "*PartitionRegion.sendMsgByBucket(bucketId,
>> PutAllPRMessage)*"..Can this API internally takes care of redundancy
>> (ingestion into secondary buckets on peer nodes)..?
>> 
>> Can someone explain about
>> *PutAllPRMessage.operateOnPartitionedRegion(ClusterDistributionManager
>> dm, PartitionedRegion pr,..)*, it seems this handles putAll msg from peer..
>> When is this required..?
>> 
>> Thanks
>> 
>> Steve M.
>> 
>> On Wed, Apr 15, 2020 at 11:06 PM Anilkumar Gingade 
>> wrote:
>> 
>>> About api: I would not recommend using bucketId in api, as it is internal
>>> and there are other internal/external apis that rely on bucket id
>>> calculations; which could be compromised here.
>>> 
>>> Instead of adding new APIs, probably looking at minimizing/reducing the
>>> time spent may be a good start.
>>> 
>>> BucketRegin.waitUntilLocked - A putAll thread could spend time here, when
>>> there are multiple threads acting upon the same thread; one way to reduce
>>> this is by tuning the putall size, can you try changing our putall size
>>> (say start with 100).
>>> 
>>> I am wondering about the time spent in hashcode(); is it a custom code?
>>> 
>>> If you want to create the buckets upfront, you can try calling the
>> method:
>>> PartitionRegionHelper.assignBucketsToPartitions().
>>> 
>>> -Anil
>>> 
>>> 
>>> On Wed, Apr 15, 2020 at 8:37 AM steve mathew 
>>> wrote:
>>> 
 Thanks Den, Anil and Udo for your inputs. Extremely sorry for late rely
>>> as
 I took bit of time to explore and understand geode internals.
 
 It seems BucketRegion/Bucket terminology is not exposed to user but
>>> still i
 am trying to achieve something that is uncommon and for which client
>> API
>>> is
 not exposed.
 
 *Details about Use-case/Client *
 - MultiThreadClient - Each task perform data-ingestion on specific
>>> bucket.
 Each task knows the bucket number to ingest data. In-short client knows
 task-->bucket mapping.
 - Each task iteratively ingest-data into batch (configurable) of 1000
 records to the bucket assigned to it.
 - Parallelism is achieved by running multiple tasks concurrently.
 
 
 *When i tried with exisitng R.putAll() API, observed slow performance
>> and
 related observations are* - Few tasks takes quite a longer time
>>> (ThreaDump
 shows--> Thread WAITING on BucketRegin.waitUntilLocked), hence overall
 client takes longer time.
 - Code profiling shows good amount of time spent during hash-code
 calculation. It seems key.hashCode() gets calculated in on both client
>>> and
 server, which is not required for my use-case as task-->bucket mapping
 known before.
 - putAll() client implementation takes care of Parallelism (using
 PRMetadata enabled thread-pool and reshuffle the keys internally), but
>> in
 my-case that's taken care by multiple tasks each per buckrt within my
 client.
 
 *I have forked the Geode codebase and trying to extend it by providing
>> a
 client API like, *
 //Region.java
 /**
 * putAll records in specified bucket
 */
 *public void putAll(int bucketId, map) *
 
 Already added client side message and related code 

Re: Reconfiguring our notifications and more

2020-04-22 Thread Anthony Baker
Sorry for the delay, my day went very sideways yesterday.  I’ve pushed changes 
to master (and develop where needed) for the following repos:

geode
geode-benchmarks
geode-examples
geode-kafka-connector
geode-native
geode-site

I’m not entirely satisfied it’s doing everything we want (like linking JIRA’s 
to PR’s) but at least it directs the notifications to the right ML.

Anthony

> On Apr 21, 2020, at 8:54 AM, Anthony Baker  wrote:
> 
> I’d like a quick round of feedback so we can stop the dev@ spamming [1].
> 
> ASF has implemented a cool way to give projects control of lots of things 
> [2].  In short, you provide a .asf.yml in each and every GitHub repo to 
> control lots of interesting things.  For us the most immediately relevant are:
> 
> notifications:
>  commits: comm...@geode.apache.org
>  issues:  iss...@geode.apache.org
>  pullrequests:  notificati...@geode.apache.org
>  jira_options: link label comment
> 
> I’d like to commit this to /develop and cherry-pick over to /master ASAP.  
> Later on we can explore the website and GitHub sections.  Let me know what 
> you think.
> 
> Anthony
> 
> 
> [1] https://issues.apache.org/jira/browse/INFRA-20156
> [2] 
> https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories#id-.asf.yamlfeaturesforgitrepositories-Notificationsettingsforrepositories



Fixed: apache/geode-examples#439 (develop - cc644fa)

2020-04-22 Thread Travis CI
Build Update for apache/geode-examples
-

Build: #439
Status: Fixed

Duration: 22 mins and 7 secs
Commit: cc644fa (develop)
Author: Anthony Baker
Message: Updated notifications configuration file

View the changeset: 
https://github.com/apache/geode-examples/compare/28f580168a00...cc644faadd61

View the full build log and details: 
https://travis-ci.org/github/apache/geode-examples/builds/678218603?utm_medium=notification_source=email

--

You can unsubscribe from build emails from the apache/geode-examples repository 
going to 
https://travis-ci.org/account/preferences/unsubscribe?repository=11483039_medium=notification_source=email.
Or unsubscribe from *all* email updating your settings at 
https://travis-ci.org/account/preferences/unsubscribe?utm_medium=notification_source=email.
Or configure specific recipients for build notifications in your .travis.yml 
file. See https://docs.travis-ci.com/user/notifications.



[GitHub] [geode] alb3rtobr commented on issue #4978: Fix for regression introduced by GEODE-7565

2020-04-22 Thread GitBox


alb3rtobr commented on issue #4978:
URL: https://github.com/apache/geode/pull/4978#issuecomment-617861442


   > One additional comment regarding the following waning message:
   > 
   > ```
   > [warn 2020/04/18 23:44:22.757 PDT  tid=0x298] Unable to ping non-member 
rs-FullRegression19040559a2i32xlarge-hydra-client-63(bridgegemfire1_host1_4749:4749):41003
 for client 
identity(rs-FullRegression19040559a2i32xlarge-hydra-client-63(edgegemfire3_host1_1071:1071:loner):50046:5a182991:edgegemfire3_host1_1071,connection=2
   > ```
   > 
   > The above is logged by member `bridgegemfire1_host1_4749` in the tests, 
exactly the member to which the `ping` command was sent... this warning is 
logged within the `pingCorrectServer` method which, unless I'm missing 
something, shouldn't be invoked at all if this member is the actual recipient 
for the `ping` message. Maybe `!myID.equals(targetServer)` is not working as 
expected?.
   
   I have added a log right after the comparision to check which values were 
considered different. I have created two clusters with two servers, and in my 
case I dont see the comparision is not working, this is an example:
   ```
   [info 2020/04/22 15:05:01.338 GMT  
tid=0x56] ALBERTO - MyID=172.17.0.3(server-0:85):41000 - 
target=172.17.0.9(server-1:84):41000
   ```
   Could you add the same log to your code?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [geode] prettyClouds opened a new pull request #4984: Refactor: Split SetsIntegrationTest into multiple files

2020-04-22 Thread GitBox


prettyClouds opened a new pull request #4984:
URL: https://github.com/apache/geode/pull/4984


   Thank you for submitting a contribution to Apache Geode.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in 
the commit message?
   
   - [ ] Has your PR been rebased against the latest commit within the target 
branch (typically `develop`)?
   
   - [ ] Is your initial contribution a single, squashed commit?
   
   - [ ] Does `gradlew build` run cleanly?
   
   - [ ] Have you written or updated unit tests to verify your changes?
   
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   
   ### Note:
   Please ensure that once the PR is submitted, check Concourse for build 
issues and
   submit an update to your PR as soon as possible. If you need help, please 
send an
   email to dev@geode.apache.org.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [geode] jujoramos commented on issue #4978: Fix for regression introduced by GEODE-7565

2020-04-22 Thread GitBox


jujoramos commented on issue #4978:
URL: https://github.com/apache/geode/pull/4978#issuecomment-617814149


   One additional comment regarding the following waning message:
   
   ```
   [warn 2020/04/18 23:44:22.757 PDT  
tid=0x298] Unable to ping non-member 
rs-FullRegression19040559a2i32xlarge-hydra-client-63(bridgegemfire1_host1_4749:4749):41003
 for client 
identity(rs-FullRegression19040559a2i32xlarge-hydra-client-63(edgegemfire3_host1_1071:1071:loner):50046:5a182991:edgegemfire3_host1_1071,connection=2
   ```
   
   The above is logged by member `bridgegemfire1_host1_4749` in the tests, 
exactly the member to which the `ping` command was sent... this warning is 
logged within the `pingCorrectServer` method which, unless I'm missing 
something, shouldn't be invoked at all if this member is the actual recipient 
for the `ping` message. Maybe `!myID.equals(targetServer)` is not working as 
expected?.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [geode] jdeppe-pivotal commented on issue #4861: GEODE-7905: RedisDistDUnitTest failing intermittently in CI

2020-04-22 Thread GitBox


jdeppe-pivotal commented on issue #4861:
URL: https://github.com/apache/geode/pull/4861#issuecomment-617771032


   Since we're actively working on various approaches to this, and it's 
probably still going to take a while, I'm going to close this PR for now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [geode] jujoramos commented on a change in pull request #4978: Fix for regression introduced by GEODE-7565

2020-04-22 Thread GitBox


jujoramos commented on a change in pull request #4978:
URL: https://github.com/apache/geode/pull/4978#discussion_r412828019



##
File path: 
geode-core/src/main/java/org/apache/geode/cache/client/internal/PingOp.java
##
@@ -43,12 +45,16 @@ private PingOp() {
   static class PingOpImpl extends AbstractOp {
 
 private long startTime;
+private ServerLocation location;

Review comment:
   The above fields are not used anywhere, can we just remove them?, or am 
I missing something?.

##
File path: 
geode-core/src/main/java/org/apache/geode/cache/client/internal/PingOp.java
##
@@ -65,8 +71,9 @@ protected boolean needsUserId() {
 @Override
 protected void sendMessage(Connection cnx) throws Exception {
   getMessage().clearMessageHasSecurePartFlag();
-  this.startTime = System.currentTimeMillis();
-  getMessage().send(false);
+  getMessage().setNumberOfParts(1);
+  getMessage().addObjPart(serverID);
+  getMessage().send(true);

Review comment:
   These operations can be directly executed within the `PingOpImpl` 
constructor, as we do with the rest of the messages. I've also noted that you 
changed the `clearMessage` flag from `false` to `true` as well, any reasons 
behind that?.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [geode] jujoramos commented on a change in pull request #4970: GEODE-7676: Add PR clear with expiration tests

2020-04-22 Thread GitBox


jujoramos commented on a change in pull request #4970:
URL: https://github.com/apache/geode/pull/4970#discussion_r412794182



##
File path: 
geode-core/src/distributedTest/java/org/apache/geode/internal/cache/PartitionedRegionClearWithExpirationDUnitTest.java
##
@@ -0,0 +1,537 @@
+/*

Review comment:
   > The test needs 30 minutes to finish all combination. And many tests 
took more than 30 seconds each.
   
   This is not a bad thing per se, I wanted to test all possible combinations 
so it's expected for the distributed test to take some time to finish. I'll 
remove some combinations, though, so the overall time will be reduced.
   
   > Many combinations are unnecessary, for example we want to see the 
expiration tasks are cancelled, we don't care what expiration. Only when we 
want to see an expiration to be triggered, then we need some (not all) 
expiration types. In my opinion, we can hard code to use DESTROY expiration 
type only in this test.
   We don't have to verify clear from accessor or server. Some other tests have 
verified that. You can just use server to do clear.
   Region types can also be reduced to a few. We have other tests to verify 
that all the combination of region type can do clear successfully. We only need 
to verify the expiration tasks are cleared in this test.
   
   Having tests to verify several possible combinations is better than having 
no tests at all, specially for the region types as we might change the 
implementation class in the future... if we do, this test might be able to 
catch possible regressions introduced,  so I'd prefer to keep them all. 
Regarding the `ExpirationAction`, I agree, will remove the `INVALIDATE` one and 
just use `DESTROY`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [geode] jujoramos commented on issue #4978: Fix for regression introduced by GEODE-7565

2020-04-22 Thread GitBox


jujoramos commented on issue #4978:
URL: https://github.com/apache/geode/pull/4978#issuecomment-617642403


   @bschuchardt @alb3rtobr : I'm still analysing the issue and seeing whether I 
can reproduce it consistently, so far it's proving quite elusive as it only 
fails once or twice in around 100 runs of my test. Will keep you posted.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org