Membership log message questions
I'm looking through logs from a DUnit test that takes longer to run than I think it should. It looks to me like membership is chewing up more time than it should, and I have questions about some membership log messages. Specifically, I'm worried about: 1) double-logging by vm1 when shutdown message is received from vm0 2) double-logging by GMSHealthMonitor in vm0 AFTER it disconnected 3) vm1 KEEPS suspecting vm0 well after a) vm1 received shutdown message from vm0, b) processed a view with vm0 departed and c) even logged that vm0 gracefully left. (vm1 seems a bit schizophrenic about vm0) Below are the log messages with my comments interspersed. Please help me understand what's actually going on here and if we're seeing undesired behavior (ie minor bugs)... KIRK: vm0 is shutting down gracefully [vm0] [info 2018/03/22 19:16:27.010 PDT tid=87] Shutting down DistributionManager 192.168.1.18(34057):32771. [vm1] [info 2018/03/22 19:16:27.014 PDT tid=66] received leave request from 192.168.1.18(34057):32771 for 192.168.1.18(34057):32771 KIRK: vm1 states that vm0 gracefully left and logs it twice which seems like a minor bug [vm1] [info 2018/03/22 19:16:27.016 PDT tid=66] Member at 192.168.1.18(34057):32771 gracefully left the distributed cache: shutdown message received [vm1] [info 2018/03/22 19:16:27.016 PDT tid=66] Member at 192.168.1.18(34057):32771 gracefully left the distributed cache: shutdown message received [vm1] [info 2018/03/22 19:16:27.332 PDT tid=41] received new view: View[192.168.1.18(34059:locator):32770|3] members: [192.168.1.18(34059:locator):32770, 192.168.1.18(34058):32772{lead}] shutdown: [192.168.1.18(34057):32771] [vm1] old view is: View[192.168.1.18(34059:locator):32770|2] members: [192.168.1.18(34059:locator):32770, 192.168.1.18(34057):32771{lead}, 192.168.1.18(34058):32772] KIRK: even tho vm1 just received a new view it initiates suspect processing for vm0 which also seems like a minor bug [vm1] [info 2018/03/22 19:16:28.186 PDT tid=41] Membership ignoring suspect request for SuspectMembersMessage [suspectRequests=[SuspectRequest [member=192.168.1.18(34058):32772, reason=Member isn't responding to heartbeat requests]]] from non-member 192.168.1.18:32771 [vm1] [info 2018/03/22 19:16:28.683 PDT tid=41] Membership ignoring suspect request for SuspectMembersMessage [suspectRequests=[SuspectRequest [member=192.168.1.18(34058):32772, reason=Member isn't responding to heartbeat requests]]] from non-member 192.168.1.18:32771 [vm1] [info 2018/03/22 19:16:29.180 PDT tid=41] Membership ignoring suspect request for SuspectMembersMessage [suspectRequests=[SuspectRequest [member=192.168.1.18(34058):32772, reason=Member isn't responding to heartbeat requests]]] from non-member 192.168.1.18:32771 [vm1] [info 2018/03/22 19:16:29.681 PDT tid=41] Membership ignoring suspect request for SuspectMembersMessage [suspectRequests=[SuspectRequest [member=192.168.1.18(34058):32772, reason=Member isn't responding to heartbeat requests]]] from non-member 192.168.1.18:32771 [vm1] [info 2018/03/22 19:16:30.183 PDT tid=41] Membership ignoring suspect request for SuspectMembersMessage [suspectRequests=[SuspectRequest [member=192.168.1.18(34059:locator):32770, reason=Member isn't responding to heartbeat requests]]] from non-member 192.168.1.18:32771 KIRK: vm0 surprisingly continues logging -- why is it suspecting other members? why does it still have GMSHealthMonitor running? why does it log the message twice? and again logging it twice looks like a minor bug to me unless all of this double logging that I'm seeing means membership is doing twice the work which would be horrible (please put my mind to rest about this) [vm0] [info 2018/03/22 19:06:40.319 PDT tid=41] All other members are suspect at this point [vm0] [info 2018/03/22 19:06:40.320 PDT tid=41] All other members are suspect at this point KIRK: after the above 2 messages vm1 continues logging about suspect (why? vm0 gracefully left!) [vm1] [info 2018/03/22 19:16:30.682 PDT tid=41] Membership ignoring suspect request for SuspectMembersMessage [suspectRequests=[SuspectRequest [member=192.168.1.18(34059:locator):32770, reason=Member isn't responding to heartbeat requests]]] from non-member 192.168.1.18:32771 [vm1] [info 2018/03/22 19:16:31.181 PDT tid=41] Membership ignoring suspect request for SuspectMembersMessage [suspectRequests=[SuspectRequest [member=192.168.1.18(34059:locator):32770, reason=Member isn't responding to heartbeat requests]]] from non-member 192.168.1.18:32771 [vm1] [info 2018/03/22 19:16:31.680 PDT tid=41] Membership ignoring suspect request for SuspectMembersMessage [suspectRequests=[SuspectRequest [member=192.168.1.18(34059:locator):32770, reason=Member isn't responding to heartbeat requests]]] from non-member 192.168.1.18:32771
Geode unit tests completed in 'develop/DistributedTest' with non-zero exit code
Pipeline results can be found at: Concourse: https://concourse.apachegeode-ci.info/teams/main/pipelines/develop/jobs/DistributedTest/builds/220
Broken: apache/geode#6976 (add-extensions-to-classpath-155484283 - bcafe0e)
Build Update for apache/geode - Build: #6976 Status: Broken Duration: 20 minutes and 47 seconds Commit: bcafe0e (add-extensions-to-classpath-155484283) Author: Orhan Kislal Message: Add extensions jars to locator/server's classpath [GEODE-4923] Signed-off-by: Jianxia ChenView the changeset: https://github.com/apache/geode/compare/73675ab210a9...bcafe0e890d9 View the full build log and details: https://travis-ci.org/apache/geode/builds/357137786?utm_source=email_medium=notification -- You can configure recipients for build notifications in your .travis.yml file. See https://docs.travis-ci.com/user/notifications This email was sent to dev@geode.apache.org (mailto:dev@geode.apache.org) unsubscribe from this list (http://clicks.travis-ci.com/track/unsub.php?u=14313403=99a5915d1cca49a097118ad60657a813.J7HZbFy6S8dTlH7tD%2B7uJ8FM8HM%3D=https%3A%2F%2Fmandrillapp.com%2Funsub%3Fmd_email%3Ddev%2540geode.apache.org)
Broken: apache/geode#6974 (add-extensions-to-classpath-155484283 - 73675ab)
Build Update for apache/geode - Build: #6974 Status: Broken Duration: 19 minutes and 7 seconds Commit: 73675ab (add-extensions-to-classpath-155484283) Author: Orhan Kislal Message: Add tests Signed-off-by: Jianxia ChenView the changeset: https://github.com/apache/geode/compare/353eb4a44031...73675ab210a9 View the full build log and details: https://travis-ci.org/apache/geode/builds/357127900?utm_source=email_medium=notification -- You can configure recipients for build notifications in your .travis.yml file. See https://docs.travis-ci.com/user/notifications This email was sent to dev@geode.apache.org (mailto:dev@geode.apache.org) unsubscribe from this list (http://clicks.travis-ci.com/track/unsub.php?u=14313403=f297055891e041078e4c573f9e3e9f5b.J7HZbFy6S8dTlH7tD%2B7uJ8FM8HM%3D=https%3A%2F%2Fmandrillapp.com%2Funsub%3Fmd_email%3Ddev%2540geode.apache.org)
Errored: apache/geode#6962 (develop - d23c6d1)
Build Update for apache/geode - Build: #6962 Status: Errored Duration: 8 minutes and 11 seconds Commit: d23c6d1 (develop) Author: Jens Deppe Message: GEODE-4386: Add gfsh command to describe jndi-binding (#1653) View the changeset: https://github.com/apache/geode/compare/3d5ad6903c73...d23c6d1c40a3 View the full build log and details: https://travis-ci.org/apache/geode/builds/356967913?utm_source=email_medium=notification -- You can configure recipients for build notifications in your .travis.yml file. See https://docs.travis-ci.com/user/notifications This email was sent to dev@geode.apache.org (mailto:dev@geode.apache.org) unsubscribe from this list (http://clicks.travis-ci.com/track/unsub.php?u=14313403=36efc43ff6ac4927ae83903f1d597ac8.J7HZbFy6S8dTlH7tD%2B7uJ8FM8HM%3D=https%3A%2F%2Fmandrillapp.com%2Funsub%3Fmd_email%3Ddev%2540geode.apache.org)
Re: [DISCUSS] New List for Commit and CI Emails
+0 I think sending notifications another list are likely to be ignored, or at-least hard to make sure people sign up for the new list. I would be for anything to 'optimize' the amount of emails from the automated systems. --Mark On Thu, Mar 22, 2018 at 12:55 PM, Ernest Burghardtwrote: > +1 for less noise and spam > > On Wed, Mar 21, 2018 at 11:56 AM, Galen O'Sullivan > wrote: > > > Yeah, I think I'm sending myself convinced by Swapnil's argument. > > > > How about muting the "nightly build succeeded" email? > > > > On Wed, Mar 21, 2018 at 9:58 AM, Sean Goller wrote: > > > > > Concourse sends mail whenever a job fails. > > > > > > On Wed, Mar 21, 2018 at 9:49 AM, Swapnil Bawaskar < > sbawas...@pivotal.io> > > > wrote: > > > > > > > I know travis is already configured to send emails only when the > build > > > > breaks and then when it is fixed. Is concourse configured the same? > > > > > > > > On Wed, Mar 21, 2018 at 9:38 AM Patrick Rhomberg < > prhomb...@pivotal.io > > > > > > > wrote: > > > > > > > > > I'm with Swapnil on this one. I think the way we make it less > noisy > > is > > > > to > > > > > take the time to fix the failing tests. > > > > > > > > > > I suppose we could split the difference and give the CI emails a, > > say, > > > > > daily cadence. No news is good news, or else it gives you all the > > > > failures > > > > > in the last 24 hours. Don't know how easy that would be to cache > and > > > > > report under the existing framework, though. > > > > > > > > > > On Wed, Mar 21, 2018 at 12:05 AM, Jacob Barrett < > jbarr...@pivotal.io > > > > > > > > wrote: > > > > > > > > > > > It’s sad that the most frequent spammer... e... I mean mailer > > is > > > > the > > > > > > new CI process. If we aren’t going to send it elsewhere how can > we > > > make > > > > > it > > > > > > less noisy? > > > > > > > > > > > > -Jake > > > > > > > > > > > > > > > > > > > On Mar 20, 2018, at 8:37 PM, Dan Smith > > wrote: > > > > > > > > > > > > > > I was curious about the stats for bot vs. humans on the dev > list. > > > Out > > > > > of > > > > > > > 915 messages, looks like we're about 50% robot. > > > > > > > > > > > > > > I'm still be in favor of not sending these messages to > dev@geode. > > > > Long > > > > > > time > > > > > > > members have probably already created a mail filter by now (I > > know > > > I > > > > > > have) > > > > > > > so we're only hurting newbies by sending a bunch of messages. > > > > > > > > > > > > > > 1) apac...@gmail.com 241 > > > > > > > 2) Spring CI 109 > > > > > > > 3) Kirk Lund 63 > > > > > > > 4) Apache Jenkins Server 51 > > > > > > > 5) Anthony Baker 41 > > > > > > > 6) Dan Smith 40 > > > > > > > 7) Travis CI 38 > > > > > > > > > > > > > > > > > > > > >
Re: Apache Jira access
Done - you should have access now. Thanks, -Dan On Thu, Mar 22, 2018 at 1:52 PM, Ivan Godwinwrote: > igodwin > > Thank you, Dan. > > On Thu, Mar 22, 2018 at 9:55 AM, Dan Smith wrote: > > > Hi Ivan, > > > > What's your JIRA username? > > > > -Dan > > > > On Thu, Mar 22, 2018 at 9:53 AM, Ivan Godwin wrote: > > > > > Hello, > > > > > > I am requesting access to Jira so that I may open issues. > > > > > > Ivan Godwin > > > > > >
[Spring CI] Spring Data GemFire > Nightly-ApacheGeode > #864 was SUCCESSFUL (with 2379 tests)
--- Spring Data GemFire > Nightly-ApacheGeode > #864 was successful. --- Scheduled 2381 tests in total. https://build.spring.io/browse/SGF-NAG-864/ -- This message is automatically generated by Atlassian Bamboo
Re: Apache Jira access
igodwin Thank you, Dan. On Thu, Mar 22, 2018 at 9:55 AM, Dan Smithwrote: > Hi Ivan, > > What's your JIRA username? > > -Dan > > On Thu, Mar 22, 2018 at 9:53 AM, Ivan Godwin wrote: > > > Hello, > > > > I am requesting access to Jira so that I may open issues. > > > > Ivan Godwin > > >
Re: [DISCUSS] New List for Commit and CI Emails
+1 for less noise and spam On Wed, Mar 21, 2018 at 11:56 AM, Galen O'Sullivanwrote: > Yeah, I think I'm sending myself convinced by Swapnil's argument. > > How about muting the "nightly build succeeded" email? > > On Wed, Mar 21, 2018 at 9:58 AM, Sean Goller wrote: > > > Concourse sends mail whenever a job fails. > > > > On Wed, Mar 21, 2018 at 9:49 AM, Swapnil Bawaskar > > wrote: > > > > > I know travis is already configured to send emails only when the build > > > breaks and then when it is fixed. Is concourse configured the same? > > > > > > On Wed, Mar 21, 2018 at 9:38 AM Patrick Rhomberg > > > > wrote: > > > > > > > I'm with Swapnil on this one. I think the way we make it less noisy > is > > > to > > > > take the time to fix the failing tests. > > > > > > > > I suppose we could split the difference and give the CI emails a, > say, > > > > daily cadence. No news is good news, or else it gives you all the > > > failures > > > > in the last 24 hours. Don't know how easy that would be to cache and > > > > report under the existing framework, though. > > > > > > > > On Wed, Mar 21, 2018 at 12:05 AM, Jacob Barrett > > > > > wrote: > > > > > > > > > It’s sad that the most frequent spammer... e... I mean mailer > is > > > the > > > > > new CI process. If we aren’t going to send it elsewhere how can we > > make > > > > it > > > > > less noisy? > > > > > > > > > > -Jake > > > > > > > > > > > > > > > > On Mar 20, 2018, at 8:37 PM, Dan Smith > wrote: > > > > > > > > > > > > I was curious about the stats for bot vs. humans on the dev list. > > Out > > > > of > > > > > > 915 messages, looks like we're about 50% robot. > > > > > > > > > > > > I'm still be in favor of not sending these messages to dev@geode. > > > Long > > > > > time > > > > > > members have probably already created a mail filter by now (I > know > > I > > > > > have) > > > > > > so we're only hurting newbies by sending a bunch of messages. > > > > > > > > > > > > 1) apac...@gmail.com 241 > > > > > > 2) Spring CI 109 > > > > > > 3) Kirk Lund 63 > > > > > > 4) Apache Jenkins Server 51 > > > > > > 5) Anthony Baker 41 > > > > > > 6) Dan Smith 40 > > > > > > 7) Travis CI 38 > > > > > > > > > > > > > > >
Re: Recreate Cache -- is it possible?
A server/peer can have its own cache configuration addition to cluster-configuration; if we still support that; does the new "reloadNewClusterConfiguration()" takes care of it? -Anil. On Wed, Mar 21, 2018 at 1:06 PM, Jinmei Liaowrote: > Sounds like this is a slippery slope. I reworked the strategy: instead of > calling cache.close, I only issue a call to the locator to get the cluster > configuration again and do a reload of the properties and cacheXml. Here is > the PR for this approach: > https://github.com/apache/geode/pull/1656 > > Basically this is what the reloadClusterConfiguration does: > https://github.com/apache/geode/pull/1656/files#diff- > 14ace6c5abf2f68c480b55a7c882e18c > > If you see anything obviously wrong, or even vaguely wrong, please comment > on the PR, we will try to test it out. > > Thanks! > > On Wed, Mar 21, 2018 at 12:42 PM, Kirk Lund wrote: > > > The non-daemon thread in a process launched with ServerLauncher is > looping > > in waitOnServer. When you close the Cache, that loop exits and the > > ServerLauncher process exits. > > > > As Bruce pointed you, JUnit and the DUnit VMs have other non-daemon > > threads. > > > > You might need to alter ServerLauncher.waitOnServer() and > > LocatorLauncher.waitOnLocator() for what you're doing. > > > > On Wed, Mar 21, 2018 at 10:28 AM, Jinmei Liao wrote: > > > > > Bruce: this sounds like the root cause of the differences between the > > dunit > > > test and reall app test. > > > > > > On Wed, Mar 21, 2018 at 10:22 AM, Bruce Schuchardt < > > bschucha...@pivotal.io > > > > > > > wrote: > > > > > > > It's likely that the JVM is exiting because the AcceptorImpl thread > is > > > the > > > > only non-daemon thread and it is stopped when the cache is closed. > > DUnit > > > > JVMs have a non-daemon main() thread that keeps them alive. > > > > > > > > > > > > > > > > On 3/21/18 9:48 AM, Jinmei Liao wrote: > > > > > > > >> We would like to allow users to import a new set of cluster > > > configuration > > > >> with running servers as long as we make sure these servers are > vanilla > > > >> servers (servers that are just started with nothing in it). Now > since > > > the > > > >> servers are already up, caches are already created, we will need to > > > >> re-create the cache with the new xml received from the locator. > > > Originally > > > >> our implementation on the servers boils down to: > > > >> > > > >> cache.close("Re-create Cache", true, true); > > > >> > > > >> GemFireCacheImpl.create(oldDs, cacheConfig); > > > >> > > > >> > > > >> but the cache.close call eventually leads to a VM exit (somehow in > the > > > >> DUunit VM, it doesn not), so this does not work with real > application > > > >> environment. Now we are wondering is there a safe to recreate the > > cache > > > >> instance with a new set of properties/cacheXml without triggering > the > > > >> entire shutdown sequence? > > > >> > > > >> > > > >> > > > > > > > > > > > > > -- > > > Cheers > > > > > > Jinmei > > > > > > > > > -- > Cheers > > Jinmei >
Geode unit tests completed in 'develop/AcceptanceTest' with non-zero exit code
Pipeline results can be found at: Concourse: https://concourse.apachegeode-ci.info/teams/main/pipelines/develop/jobs/AcceptanceTest/builds/436
Re: Apache Jira access
Hi Ivan, What's your JIRA username? -Dan On Thu, Mar 22, 2018 at 9:53 AM, Ivan Godwinwrote: > Hello, > > I am requesting access to Jira so that I may open issues. > > Ivan Godwin >
Apache Jira access
Hello, I am requesting access to Jira so that I may open issues. Ivan Godwin
Build for version 1.6.0-build.672 of Apache Geode failed.
= The build job for Apache Geode version 1.6.0-build.672 has failed. Build artifacts are available at: http://files.apachegeode-ci.info/builds/1.6.0-build.672/geode-build-artifacts-1.6.0-build.672.tgz Test results are available at: http://files.apachegeode-ci.info/builds/1.6.0-build.672/test-results/build/ Job: https://concourse.apachegeode-ci.info/teams/main/pipelines/develop/jobs/Build/builds/703 =
Re: GEODE Jira Access
Hi Michael, You should have access now. Thanks! -Dan On Wed, Mar 21, 2018 at 8:09 PM, Michael Oleskewrote: > Just made an account. Username is mole...@pivotal.io > > -michael > > Michael Oleske > Software Engineer > Pivotal - Santa Monica > > On Wednesday, March 21, 2018, Dan Smith wrote: > > > Hi Michael - What's your JIRA username? If you don't have one, please go > > ahead and create an account. > > > > -Dan > > > > On Wed, Mar 21, 2018 at 3:27 PM, Michael Oleske > > wrote: > > > > > Hi all, > > > > > > I'd like to be able to update GEODE Jira to reflect what I'm working > on, > > so > > > was hoping to get access. > > > > > > Thanks! > > > michael > > > > > > Michael Oleske > > > Software Engineer > > > Pivotal - Santa Monica > > > > > >
Re: [Proposal] Gfsh Command Feature Flag Annotation
Adding or removing options from a SpringShell command may be a bit tricky. This might require two versions of the command -- one version with the option, one without -- and we then load and register only one at runtime. On Wed, Mar 21, 2018 at 4:55 PM, Swapnil Bawaskarwrote: > This is a great start, however, there may be features that only add options > to gfsh commands rather than adding gfsh commands themselves, we should > accommodate those as and when we encounter them. > > Udo, I like the idea of having a more generic solution for feature > flagging, however, if a feature is only introducing public API, I don't see > how we could hide it using an annotation. > > On Wed, Mar 21, 2018 at 4:46 PM Swapnil Bawaskar > wrote: > > > I like @Disabled too. > > > > On Mon, Mar 19, 2018 at 12:02 PM Michael William Dodge < > mdo...@pivotal.io> > > wrote: > > > >> I kind of like @Disabled instead. > >> > >> Sarge > >> > >> > On 19 Mar, 2018, at 11:58, Udo Kohlmeyer wrote: > >> > > >> > I wonder if this proposal could not be extended to the greater GEODE > >> product. As this feature flagging is also relevant to other parts of the > >> system and should maybe be consistently applied to all areas. > >> > > >> > Thoughts? > >> > > >> > > >> > On 3/19/18 11:46, Patrick Rhomberg wrote: > >> >> Hello, All > >> >> > >> >> I am interested in extending annotation functionality on our gfsh > >> >> commands, particularly with respect to feature-flagging commands that > >> are > >> >> mutually-reliant or not yet feature complete. > >> >> Please review the proposal [1] at your convenience. > >> >> > >> >> Imagination is Change. > >> >> ~Patrick Rhomberg > >> >> > >> >> [1] > >> >> > >> https://cwiki.apache.org/confluence/display/GEODE/ > Proposal+for+Gfsh+Feature+Flag > >> >> > >> > > >> > >> >