Membership log message questions

2018-03-22 Thread Kirk Lund
I'm looking through logs from a DUnit test that takes longer to run than I
think it should. It looks to me like membership is chewing up more time
than it should, and I have questions about some membership log messages.

Specifically, I'm worried about:
1) double-logging by vm1 when shutdown message is received from vm0

2) double-logging by GMSHealthMonitor in vm0 AFTER it disconnected

3) vm1 KEEPS suspecting vm0 well after a) vm1 received shutdown message
from vm0, b) processed a view with vm0 departed and c) even logged that vm0
gracefully left. (vm1 seems a bit schizophrenic about vm0)

Below are the log messages with my comments interspersed. Please help me
understand what's actually going on here and if we're seeing undesired
behavior (ie minor bugs)...

KIRK: vm0 is shutting down gracefully

[vm0] [info 2018/03/22 19:16:27.010 PDT  tid=87] Shutting down DistributionManager
192.168.1.18(34057):32771.

[vm1] [info 2018/03/22 19:16:27.014 PDT  tid=66] received leave request from
192.168.1.18(34057):32771 for 192.168.1.18(34057):32771

KIRK: vm1 states that vm0 gracefully left and logs it twice which seems
like a minor bug

[vm1] [info 2018/03/22 19:16:27.016 PDT  tid=66] Member at 192.168.1.18(34057):32771 gracefully
left the distributed cache: shutdown message received

[vm1] [info 2018/03/22 19:16:27.016 PDT  tid=66] Member at 192.168.1.18(34057):32771 gracefully
left the distributed cache: shutdown message received

[vm1] [info 2018/03/22 19:16:27.332 PDT  tid=41] received new view:
View[192.168.1.18(34059:locator):32770|3] members:
[192.168.1.18(34059:locator):32770,
192.168.1.18(34058):32772{lead}]  shutdown:
[192.168.1.18(34057):32771]
[vm1] old view is: View[192.168.1.18(34059:locator):32770|2]
members: [192.168.1.18(34059:locator):32770,
192.168.1.18(34057):32771{lead}, 192.168.1.18(34058):32772]

KIRK: even tho vm1 just received a new view it initiates suspect processing
for vm0 which also seems like a minor bug

[vm1] [info 2018/03/22 19:16:28.186 PDT  tid=41] Membership ignoring suspect
request for SuspectMembersMessage [suspectRequests=[SuspectRequest
[member=192.168.1.18(34058):32772, reason=Member isn't responding
to heartbeat requests]]] from non-member 192.168.1.18:32771

[vm1] [info 2018/03/22 19:16:28.683 PDT  tid=41] Membership ignoring suspect
request for SuspectMembersMessage [suspectRequests=[SuspectRequest
[member=192.168.1.18(34058):32772, reason=Member isn't responding
to heartbeat requests]]] from non-member 192.168.1.18:32771

[vm1] [info 2018/03/22 19:16:29.180 PDT  tid=41] Membership ignoring suspect
request for SuspectMembersMessage [suspectRequests=[SuspectRequest
[member=192.168.1.18(34058):32772, reason=Member isn't responding
to heartbeat requests]]] from non-member 192.168.1.18:32771

[vm1] [info 2018/03/22 19:16:29.681 PDT  tid=41] Membership ignoring suspect
request for SuspectMembersMessage [suspectRequests=[SuspectRequest
[member=192.168.1.18(34058):32772, reason=Member isn't responding
to heartbeat requests]]] from non-member 192.168.1.18:32771

[vm1] [info 2018/03/22 19:16:30.183 PDT  tid=41] Membership ignoring suspect
request for SuspectMembersMessage [suspectRequests=[SuspectRequest
[member=192.168.1.18(34059:locator):32770, reason=Member isn't
responding to heartbeat requests]]] from non-member
192.168.1.18:32771

KIRK: vm0 surprisingly continues logging -- why is it suspecting other
members? why does it still have GMSHealthMonitor running? why does it log
the message twice? and again logging it twice looks like a minor bug to me
unless all of this double logging that I'm seeing means membership is doing
twice the work which would be horrible (please put my mind to rest about
this)

[vm0] [info 2018/03/22 19:06:40.319 PDT 
tid=41] All other members are suspect at this point

[vm0] [info 2018/03/22 19:06:40.320 PDT 
tid=41] All other members are suspect at this point

KIRK: after the above 2 messages vm1 continues logging about suspect (why?
vm0 gracefully left!)

[vm1] [info 2018/03/22 19:16:30.682 PDT  tid=41] Membership ignoring suspect
request for SuspectMembersMessage [suspectRequests=[SuspectRequest
[member=192.168.1.18(34059:locator):32770, reason=Member isn't
responding to heartbeat requests]]] from non-member
192.168.1.18:32771

[vm1] [info 2018/03/22 19:16:31.181 PDT  tid=41] Membership ignoring suspect
request for SuspectMembersMessage [suspectRequests=[SuspectRequest
[member=192.168.1.18(34059:locator):32770, reason=Member isn't
responding to heartbeat requests]]] from non-member
192.168.1.18:32771

[vm1] [info 2018/03/22 19:16:31.680 PDT  tid=41] Membership ignoring suspect
request for SuspectMembersMessage [suspectRequests=[SuspectRequest
[member=192.168.1.18(34059:locator):32770, reason=Member isn't
responding to heartbeat requests]]] from non-member
192.168.1.18:32771


Geode unit tests completed in 'develop/DistributedTest' with non-zero exit code

2018-03-22 Thread apachegeodeci
Pipeline results can be found at:

Concourse: 
https://concourse.apachegeode-ci.info/teams/main/pipelines/develop/jobs/DistributedTest/builds/220



Broken: apache/geode#6976 (add-extensions-to-classpath-155484283 - bcafe0e)

2018-03-22 Thread Travis CI
Build Update for apache/geode
-

Build: #6976
Status: Broken

Duration: 20 minutes and 47 seconds
Commit: bcafe0e (add-extensions-to-classpath-155484283)
Author: Orhan Kislal
Message: Add extensions jars to locator/server's classpath

[GEODE-4923]

Signed-off-by: Jianxia Chen 

View the changeset: 
https://github.com/apache/geode/compare/73675ab210a9...bcafe0e890d9

View the full build log and details: 
https://travis-ci.org/apache/geode/builds/357137786?utm_source=email_medium=notification

--

You can configure recipients for build notifications in your .travis.yml file. 
See https://docs.travis-ci.com/user/notifications







This email was sent to dev@geode.apache.org (mailto:dev@geode.apache.org)
unsubscribe from this list 
(http://clicks.travis-ci.com/track/unsub.php?u=14313403=99a5915d1cca49a097118ad60657a813.J7HZbFy6S8dTlH7tD%2B7uJ8FM8HM%3D=https%3A%2F%2Fmandrillapp.com%2Funsub%3Fmd_email%3Ddev%2540geode.apache.org)

Broken: apache/geode#6974 (add-extensions-to-classpath-155484283 - 73675ab)

2018-03-22 Thread Travis CI
Build Update for apache/geode
-

Build: #6974
Status: Broken

Duration: 19 minutes and 7 seconds
Commit: 73675ab (add-extensions-to-classpath-155484283)
Author: Orhan Kislal
Message: Add tests

Signed-off-by: Jianxia Chen 

View the changeset: 
https://github.com/apache/geode/compare/353eb4a44031...73675ab210a9

View the full build log and details: 
https://travis-ci.org/apache/geode/builds/357127900?utm_source=email_medium=notification

--

You can configure recipients for build notifications in your .travis.yml file. 
See https://docs.travis-ci.com/user/notifications







This email was sent to dev@geode.apache.org (mailto:dev@geode.apache.org)
unsubscribe from this list 
(http://clicks.travis-ci.com/track/unsub.php?u=14313403=f297055891e041078e4c573f9e3e9f5b.J7HZbFy6S8dTlH7tD%2B7uJ8FM8HM%3D=https%3A%2F%2Fmandrillapp.com%2Funsub%3Fmd_email%3Ddev%2540geode.apache.org)

Errored: apache/geode#6962 (develop - d23c6d1)

2018-03-22 Thread Travis CI
Build Update for apache/geode
-

Build: #6962
Status: Errored

Duration: 8 minutes and 11 seconds
Commit: d23c6d1 (develop)
Author: Jens Deppe
Message: GEODE-4386: Add gfsh command to describe jndi-binding (#1653)

View the changeset: 
https://github.com/apache/geode/compare/3d5ad6903c73...d23c6d1c40a3

View the full build log and details: 
https://travis-ci.org/apache/geode/builds/356967913?utm_source=email_medium=notification

--

You can configure recipients for build notifications in your .travis.yml file. 
See https://docs.travis-ci.com/user/notifications







This email was sent to dev@geode.apache.org (mailto:dev@geode.apache.org)
unsubscribe from this list 
(http://clicks.travis-ci.com/track/unsub.php?u=14313403=36efc43ff6ac4927ae83903f1d597ac8.J7HZbFy6S8dTlH7tD%2B7uJ8FM8HM%3D=https%3A%2F%2Fmandrillapp.com%2Funsub%3Fmd_email%3Ddev%2540geode.apache.org)

Re: [DISCUSS] New List for Commit and CI Emails

2018-03-22 Thread Mark Bretl
+0

I think sending notifications another list are likely to be ignored, or
at-least hard to make sure people sign up for the new list. I would be for
anything to 'optimize' the amount of emails from the automated systems.

--Mark

On Thu, Mar 22, 2018 at 12:55 PM, Ernest Burghardt 
wrote:

> +1 for less noise and spam
>
> On Wed, Mar 21, 2018 at 11:56 AM, Galen O'Sullivan 
> wrote:
>
> > Yeah, I think I'm sending myself convinced by Swapnil's argument.
> >
> > How about muting the "nightly build succeeded" email?
> >
> > On Wed, Mar 21, 2018 at 9:58 AM, Sean Goller  wrote:
> >
> > > Concourse sends mail whenever a job fails.
> > >
> > > On Wed, Mar 21, 2018 at 9:49 AM, Swapnil Bawaskar <
> sbawas...@pivotal.io>
> > > wrote:
> > >
> > > > I know travis is already configured to send emails only when the
> build
> > > > breaks and then when it is fixed. Is concourse configured the same?
> > > >
> > > > On Wed, Mar 21, 2018 at 9:38 AM Patrick Rhomberg <
> prhomb...@pivotal.io
> > >
> > > > wrote:
> > > >
> > > > > I'm with Swapnil on this one.  I think the way we make it less
> noisy
> > is
> > > > to
> > > > > take the time to fix the failing tests.
> > > > >
> > > > > I suppose we could split the difference and give the CI emails a,
> > say,
> > > > > daily cadence.  No news is good news, or else it gives you all the
> > > > failures
> > > > > in the last 24 hours.  Don't know how easy that would be to cache
> and
> > > > > report under the existing framework, though.
> > > > >
> > > > > On Wed, Mar 21, 2018 at 12:05 AM, Jacob Barrett <
> jbarr...@pivotal.io
> > >
> > > > > wrote:
> > > > >
> > > > > > It’s sad that the most frequent spammer... e... I mean mailer
> > is
> > > > the
> > > > > > new CI process. If we aren’t going to send it elsewhere how can
> we
> > > make
> > > > > it
> > > > > > less noisy?
> > > > > >
> > > > > > -Jake
> > > > > >
> > > > > >
> > > > > > > On Mar 20, 2018, at 8:37 PM, Dan Smith 
> > wrote:
> > > > > > >
> > > > > > > I was curious about the stats for bot vs. humans on the dev
> list.
> > > Out
> > > > > of
> > > > > > > 915 messages, looks like we're about 50% robot.
> > > > > > >
> > > > > > > I'm still be in favor of not sending these messages to
> dev@geode.
> > > > Long
> > > > > > time
> > > > > > > members have probably already created a mail filter by now (I
> > know
> > > I
> > > > > > have)
> > > > > > > so we're only hurting newbies by sending a bunch of messages.
> > > > > > >
> > > > > > > 1) apac...@gmail.com 241
> > > > > > > 2) Spring CI 109
> > > > > > > 3) Kirk Lund 63
> > > > > > > 4) Apache Jenkins Server 51
> > > > > > > 5) Anthony Baker 41
> > > > > > > 6) Dan Smith 40
> > > > > > > 7) Travis CI 38
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Apache Jira access

2018-03-22 Thread Dan Smith
Done - you should have access now.

Thanks,
-Dan

On Thu, Mar 22, 2018 at 1:52 PM, Ivan Godwin  wrote:

> igodwin
>
> Thank you, Dan.
>
> On Thu, Mar 22, 2018 at 9:55 AM, Dan Smith  wrote:
>
> > Hi Ivan,
> >
> > What's your JIRA username?
> >
> > -Dan
> >
> > On Thu, Mar 22, 2018 at 9:53 AM, Ivan Godwin  wrote:
> >
> > > Hello,
> > >
> > > I am requesting access to Jira so that I may open issues.
> > >
> > > Ivan Godwin
> > >
> >
>


[Spring CI] Spring Data GemFire > Nightly-ApacheGeode > #864 was SUCCESSFUL (with 2379 tests)

2018-03-22 Thread Spring CI

---
Spring Data GemFire > Nightly-ApacheGeode > #864 was successful.
---
Scheduled
2381 tests in total.

https://build.spring.io/browse/SGF-NAG-864/





--
This message is automatically generated by Atlassian Bamboo

Re: Apache Jira access

2018-03-22 Thread Ivan Godwin
igodwin

Thank you, Dan.

On Thu, Mar 22, 2018 at 9:55 AM, Dan Smith  wrote:

> Hi Ivan,
>
> What's your JIRA username?
>
> -Dan
>
> On Thu, Mar 22, 2018 at 9:53 AM, Ivan Godwin  wrote:
>
> > Hello,
> >
> > I am requesting access to Jira so that I may open issues.
> >
> > Ivan Godwin
> >
>


Re: [DISCUSS] New List for Commit and CI Emails

2018-03-22 Thread Ernest Burghardt
+1 for less noise and spam

On Wed, Mar 21, 2018 at 11:56 AM, Galen O'Sullivan 
wrote:

> Yeah, I think I'm sending myself convinced by Swapnil's argument.
>
> How about muting the "nightly build succeeded" email?
>
> On Wed, Mar 21, 2018 at 9:58 AM, Sean Goller  wrote:
>
> > Concourse sends mail whenever a job fails.
> >
> > On Wed, Mar 21, 2018 at 9:49 AM, Swapnil Bawaskar 
> > wrote:
> >
> > > I know travis is already configured to send emails only when the build
> > > breaks and then when it is fixed. Is concourse configured the same?
> > >
> > > On Wed, Mar 21, 2018 at 9:38 AM Patrick Rhomberg  >
> > > wrote:
> > >
> > > > I'm with Swapnil on this one.  I think the way we make it less noisy
> is
> > > to
> > > > take the time to fix the failing tests.
> > > >
> > > > I suppose we could split the difference and give the CI emails a,
> say,
> > > > daily cadence.  No news is good news, or else it gives you all the
> > > failures
> > > > in the last 24 hours.  Don't know how easy that would be to cache and
> > > > report under the existing framework, though.
> > > >
> > > > On Wed, Mar 21, 2018 at 12:05 AM, Jacob Barrett  >
> > > > wrote:
> > > >
> > > > > It’s sad that the most frequent spammer... e... I mean mailer
> is
> > > the
> > > > > new CI process. If we aren’t going to send it elsewhere how can we
> > make
> > > > it
> > > > > less noisy?
> > > > >
> > > > > -Jake
> > > > >
> > > > >
> > > > > > On Mar 20, 2018, at 8:37 PM, Dan Smith 
> wrote:
> > > > > >
> > > > > > I was curious about the stats for bot vs. humans on the dev list.
> > Out
> > > > of
> > > > > > 915 messages, looks like we're about 50% robot.
> > > > > >
> > > > > > I'm still be in favor of not sending these messages to dev@geode.
> > > Long
> > > > > time
> > > > > > members have probably already created a mail filter by now (I
> know
> > I
> > > > > have)
> > > > > > so we're only hurting newbies by sending a bunch of messages.
> > > > > >
> > > > > > 1) apac...@gmail.com 241
> > > > > > 2) Spring CI 109
> > > > > > 3) Kirk Lund 63
> > > > > > 4) Apache Jenkins Server 51
> > > > > > 5) Anthony Baker 41
> > > > > > 6) Dan Smith 40
> > > > > > 7) Travis CI 38
> > > > >
> > > >
> > >
> >
>


Re: Recreate Cache -- is it possible?

2018-03-22 Thread Anilkumar Gingade
A server/peer can have its own cache configuration addition to
cluster-configuration; if we still support that; does the new
"reloadNewClusterConfiguration()" takes care of it?

-Anil.


On Wed, Mar 21, 2018 at 1:06 PM, Jinmei Liao  wrote:

> Sounds like this is a slippery slope. I reworked the strategy: instead of
> calling cache.close, I only issue a call to the locator to get the cluster
> configuration again and do a reload of the properties and cacheXml. Here is
> the PR for this approach:
> https://github.com/apache/geode/pull/1656
>
> Basically this is what the reloadClusterConfiguration does:
> https://github.com/apache/geode/pull/1656/files#diff-
> 14ace6c5abf2f68c480b55a7c882e18c
>
> If you see anything obviously wrong, or even vaguely wrong, please comment
> on the PR, we will try to test it out.
>
> Thanks!
>
> On Wed, Mar 21, 2018 at 12:42 PM, Kirk Lund  wrote:
>
> > The non-daemon thread in a process launched with ServerLauncher is
> looping
> > in waitOnServer. When you close the Cache, that loop exits and the
> > ServerLauncher process exits.
> >
> > As Bruce pointed you, JUnit and the DUnit VMs have other non-daemon
> > threads.
> >
> > You might need to alter ServerLauncher.waitOnServer() and
> > LocatorLauncher.waitOnLocator() for what you're doing.
> >
> > On Wed, Mar 21, 2018 at 10:28 AM, Jinmei Liao  wrote:
> >
> > > Bruce: this sounds like the root cause of the differences between the
> > dunit
> > > test and reall app test.
> > >
> > > On Wed, Mar 21, 2018 at 10:22 AM, Bruce Schuchardt <
> > bschucha...@pivotal.io
> > > >
> > > wrote:
> > >
> > > > It's likely that the JVM is exiting because the AcceptorImpl thread
> is
> > > the
> > > > only non-daemon thread and it is stopped when the cache is closed.
> > DUnit
> > > > JVMs have a non-daemon main() thread that keeps them alive.
> > > >
> > > >
> > > >
> > > > On 3/21/18 9:48 AM, Jinmei Liao wrote:
> > > >
> > > >> We would like to allow users to import a new set of cluster
> > > configuration
> > > >> with running servers as long as we make sure these servers are
> vanilla
> > > >> servers (servers that are just started with nothing in it). Now
> since
> > > the
> > > >> servers are already up, caches are already created, we will need to
> > > >> re-create the cache with the new xml received from the locator.
> > > Originally
> > > >> our implementation on the servers boils down to:
> > > >>
> > > >> cache.close("Re-create Cache", true, true);
> > > >>
> > > >> GemFireCacheImpl.create(oldDs, cacheConfig);
> > > >>
> > > >>
> > > >> but the cache.close call eventually leads to a VM exit (somehow in
> the
> > > >> DUunit VM, it doesn not), so this does not work with real
> application
> > > >> environment. Now we are wondering is there a safe to recreate the
> > cache
> > > >> instance with a new set of properties/cacheXml without triggering
> the
> > > >> entire shutdown sequence?
> > > >>
> > > >>
> > > >>
> > > >
> > >
> > >
> > > --
> > > Cheers
> > >
> > > Jinmei
> > >
> >
>
>
>
> --
> Cheers
>
> Jinmei
>


Geode unit tests completed in 'develop/AcceptanceTest' with non-zero exit code

2018-03-22 Thread apachegeodeci
Pipeline results can be found at:

Concourse: 
https://concourse.apachegeode-ci.info/teams/main/pipelines/develop/jobs/AcceptanceTest/builds/436



Re: Apache Jira access

2018-03-22 Thread Dan Smith
Hi Ivan,

What's your JIRA username?

-Dan

On Thu, Mar 22, 2018 at 9:53 AM, Ivan Godwin  wrote:

> Hello,
>
> I am requesting access to Jira so that I may open issues.
>
> Ivan Godwin
>


Apache Jira access

2018-03-22 Thread Ivan Godwin
Hello,

I am requesting access to Jira so that I may open issues.

Ivan Godwin


Build for version 1.6.0-build.672 of Apache Geode failed.

2018-03-22 Thread apachegeodeci
=

The build job for Apache Geode version 1.6.0-build.672 has failed.


Build artifacts are available at:
http://files.apachegeode-ci.info/builds/1.6.0-build.672/geode-build-artifacts-1.6.0-build.672.tgz

Test results are available at:
http://files.apachegeode-ci.info/builds/1.6.0-build.672/test-results/build/


Job: 
https://concourse.apachegeode-ci.info/teams/main/pipelines/develop/jobs/Build/builds/703

=


Re: GEODE Jira Access

2018-03-22 Thread Dan Smith
Hi Michael,

You should have access now. Thanks!

-Dan

On Wed, Mar 21, 2018 at 8:09 PM, Michael Oleske  wrote:

> Just made an account. Username is mole...@pivotal.io
>
> -michael
>
> Michael Oleske
> Software Engineer
> Pivotal - Santa Monica
>
> On Wednesday, March 21, 2018, Dan Smith  wrote:
>
> > Hi Michael - What's your JIRA username? If you don't have one, please go
> > ahead and create an account.
> >
> > -Dan
> >
> > On Wed, Mar 21, 2018 at 3:27 PM, Michael Oleske 
> > wrote:
> >
> > > Hi all,
> > >
> > > I'd like to be able to update GEODE Jira to reflect what I'm working
> on,
> > so
> > > was hoping to get access.
> > >
> > > Thanks!
> > > michael
> > >
> > > Michael Oleske
> > > Software Engineer
> > > Pivotal - Santa Monica
> > >
> >
>


Re: [Proposal] Gfsh Command Feature Flag Annotation

2018-03-22 Thread Kirk Lund
Adding or removing options from a SpringShell command may be a bit tricky.
This might require two versions of the command -- one version with the
option, one without -- and we then load and register only one at runtime.

On Wed, Mar 21, 2018 at 4:55 PM, Swapnil Bawaskar 
wrote:

> This is a great start, however, there may be features that only add options
> to gfsh commands rather than adding gfsh commands themselves, we should
> accommodate those as and when we encounter them.
>
> Udo, I like the idea of having a more generic solution for feature
> flagging, however, if a feature is only introducing public API, I don't see
> how we could hide it using an annotation.
>
> On Wed, Mar 21, 2018 at 4:46 PM Swapnil Bawaskar 
> wrote:
>
> > I like @Disabled too.
> >
> > On Mon, Mar 19, 2018 at 12:02 PM Michael William Dodge <
> mdo...@pivotal.io>
> > wrote:
> >
> >> I kind of like @Disabled instead.
> >>
> >> Sarge
> >>
> >> > On 19 Mar, 2018, at 11:58, Udo Kohlmeyer  wrote:
> >> >
> >> > I wonder if this proposal could not be extended to the greater GEODE
> >> product. As this feature flagging is also relevant to other parts of the
> >> system and should maybe be consistently applied to all areas.
> >> >
> >> > Thoughts?
> >> >
> >> >
> >> > On 3/19/18 11:46, Patrick Rhomberg wrote:
> >> >> Hello, All
> >> >>
> >> >>   I am interested in extending annotation functionality on our gfsh
> >> >> commands, particularly with respect to feature-flagging commands that
> >> are
> >> >> mutually-reliant or not yet feature complete.
> >> >>   Please review the proposal [1] at your convenience.
> >> >>
> >> >> Imagination is Change.
> >> >> ~Patrick Rhomberg
> >> >>
> >> >> [1]
> >> >>
> >> https://cwiki.apache.org/confluence/display/GEODE/
> Proposal+for+Gfsh+Feature+Flag
> >> >>
> >> >
> >>
> >>
>