Proposal to Include GEODE-7079 in 1.10.0

2019-08-15 Thread Ju@N
Hello team, I'd like to propose including the *fix [1]* for *GEODE-7079 [2]* in release 1.10.0. Long story short: a *NullPointerException* can be continuously thrown and flood the member's logs if a serial event processor (either *async-event-queue* or *gateway-sender*) starts processing events

Re: Proposal to Include GEODE-7079 in 1.10.0

2019-08-15 Thread Eric Shu
+1 On Thu, Aug 15, 2019 at 9:54 AM John Blum wrote: > +1 > > On Thu, Aug 15, 2019 at 5:30 AM Ju@N wrote: > > > Hello team, > > > > I'd like to propose including the *fix [1]* for *GEODE-7079 [2]* in > release > > 1.10.0. > > Long story short: a *NullPointerException* can be continuously thrown

Re: Proposal to Include GEODE-7079 in 1.10.0

2019-08-15 Thread Udo Kohlmeyer
Seems everyone is in favor or including a /*non-critical*/ fix to an already cut branch of the a potential release... Am I missing something? Why cut a release at all... just have a perpetual cycle of fixes added to develop and users can chose what nightly snapshot build they would want to

Re: Proposal to Include GEODE-7079 in 1.10.0

2019-08-15 Thread Nabarun Nag
+1 On Thu, Aug 15, 2019 at 10:15 AM Alexander Murmann wrote: > +1 > > Agreed to fixing this. It's impossible for a user to discover they hit an > edge case that we fail to support till they are in prod and restart. > > On Thu, Aug 15, 2019 at 10:09 AM Juan José Ramos > wrote: > > > Hello Udo,

Re: Proposal to Include GEODE-7079 in 1.10.0

2019-08-15 Thread Udo Kohlmeyer
Juan, From your explanation, it seems this issue is existing and not critical. Could we possibly hold this for 1.11? --Udo On 8/15/19 5:29 AM, Ju@N wrote: Hello team, I'd like to propose including the *fix [1]* for *GEODE-7079 [2]* in release 1.10.0. Long story short: a

Re: Proposal to Include GEODE-7079 in 1.10.0

2019-08-15 Thread Alexander Murmann
+1 Agreed to fixing this. It's impossible for a user to discover they hit an edge case that we fail to support till they are in prod and restart. On Thu, Aug 15, 2019 at 10:09 AM Juan José Ramos wrote: > Hello Udo, > > Even if it is an existing issue I'd still consider it critical for those >

Re: Proposal to Include GEODE-7079 in 1.10.0

2019-08-15 Thread Anthony Baker
While we can’t fix *all known bugs*, I think where we do have a fix for an important issue we should think hard about the cost of not including that in a release. IMO, the fixed time approach to releases means that we *start* the release effort (including stabilization and bug fixing if

Re: Proposal to Include GEODE-7079 in 1.10.0

2019-08-15 Thread John Blum
+1 On Thu, Aug 15, 2019 at 5:30 AM Ju@N wrote: > Hello team, > > I'd like to propose including the *fix [1]* for *GEODE-7079 [2]* in release > 1.10.0. > Long story short: a *NullPointerException* can be continuously thrown > and flood the member's logs if a serial event processor (either >

Re: Proposal to Include GEODE-7079 in 1.10.0

2019-08-15 Thread Juan José Ramos
Hello Udo, Even if it is an existing issue I'd still consider it critical for those cases on which there are unprocessed events on the persistent queue after a restart and the region takes long to recover... you can actually see millions of *NPEs* flooding the member's logs. My two cents anyway,

Re: Proposal to Include GEODE-7079 in 1.10.0

2019-08-15 Thread Murtuza Boxwala
In this specific case, how long has this issue been in the product? When did we first see it? That would give me a lot more context in gauging the “criticality” of this. Juan, can you share that information? To Udo’s point, with every change we check in, we add some risk of instability or at

Re: I propose including the fix for GEODE-3780 in 1.10

2019-08-15 Thread Jacob Barrett
Because someone will ask, can we be proactive in these request with identifying if the issue being fixed is introduced in Geode 1.10 or is a preexisting condition. -jake > On Aug 15, 2019, at 2:09 PM, Bruce Schuchardt wrote: > > This is a fix for a problem where a member that has lost

Re: Propose fix for 1.10 release: Prevent NPE in getLocalSize()

2019-08-15 Thread Dan Smith
Normally cherry-picking to the release branch is the release managers job (Dick in this case) [1]. He asked me to help out while he was on vacation, so I will go ahead and cherry-pick it over. I kinda like the process Jake proposed though - creating a PR against the release branch. My only

Re: Propose fix for 1.10 release: Prevent NPE in getLocalSize()

2019-08-15 Thread Kirk Lund
I have the cherry-pick ready to push or file a PR. Let me know what you prefer... On Thu, Aug 15, 2019 at 3:01 PM Dan Smith wrote: > Normally cherry-picking to the release branch is the release managers job > (Dick in this case) [1]. He asked me to help out while he was on vacation, > so I will

Re: Propose fix for 1.10 release: Prevent NPE in getLocalSize()

2019-08-15 Thread Kirk Lund
Done! On Thu, Aug 15, 2019 at 3:21 PM Dan Smith wrote: > @kirk - go ahead and push it. > > -Dan > > On Thu, Aug 15, 2019 at 3:13 PM Kirk Lund wrote: > > > I have the cherry-pick ready to push or file a PR. Let me know what you > > prefer... > > > > On Thu, Aug 15, 2019 at 3:01 PM Dan Smith

Re: I propose including the fix for GEODE-3780 in 1.10

2019-08-15 Thread Udo Kohlmeyer
Looking at the Geode ticket number, it seems this problem has resurfaced, as it seems to have been addressed in 1.7.0 already. My concern is, do what know WHAT caused it to resurface? Or was this issue always dormant and only recently resurfaced? Without understand why we are seeing "fixed"

Re: Propose fix for 1.10 release: Prevent NPE in getLocalSize()

2019-08-15 Thread Dan Smith
@kirk - go ahead and push it. -Dan On Thu, Aug 15, 2019 at 3:13 PM Kirk Lund wrote: > I have the cherry-pick ready to push or file a PR. Let me know what you > prefer... > > On Thu, Aug 15, 2019 at 3:01 PM Dan Smith wrote: > > > Normally cherry-picking to the release branch is the release

Re: I propose including the fix for GEODE-3780 in 1.10

2019-08-15 Thread Bruce Schuchardt
Testing in the past week hit this problem 9 times and it was identified as a new issue. On 8/15/19 2:23 PM, Jacob Barrett wrote: Because someone will ask, can we be proactive in these request with identifying if the issue being fixed is introduced in Geode 1.10 or is a preexisting

Re: Propose fix for 1.10 release: Prevent NPE in getLocalSize()

2019-08-15 Thread Jacob Barrett
You should be able to do the cherry-pick on your fork and then open a PR against the release branch. > On Aug 15, 2019, at 2:04 PM, Aaron Lindsey wrote: > > It sounds like there is consensus on adding this fix. Could someone please > cherry-pick this for me? > > Thanks, > Aaron > >> On Aug

Re: I propose including the fix for GEODE-3780 in 1.10

2019-08-15 Thread Bruce Schuchardt
In this case it was another change that is in 1.10 that decreased the amount of time we try to connect to unreachable alert listeners that caused this problem to resurface.  This decrease allowed availability checks to proceed faster than they used to. This allowed an availability check to

New build warnings

2019-08-15 Thread Kirk Lund
Just a reminder, that our many sun.misc.* warnings are drowning out real warnings... We are adding new build warnings which makes me sad. This one was added recently: /Users/klund/dev/geode3/geode-core/src/main/java/org/apache/geode/internal/cache/tier/sockets/AcceptorImpl.java:1075: warning:

I propose including the fix for GEODE-3780 in 1.10

2019-08-15 Thread Bruce Schuchardt
This is a fix for a problem where a member that has lost quorum does not detect it and does not shut down.  The fix is small and has been extensively tested.  The fix also addresses the possibility of a member being kicked out of the cluster when it is only late in delivering a heartbeat

Re: Proposal to Include GEODE-7079 in 1.10.0

2019-08-15 Thread Udo Kohlmeyer
@Dan, not arguing that logs filling up with NPE's could bring a system down with limit disk space, or potentially swallowing important logs that could be helpful in root-causing issues... I'm merely raising the question on why this bug fix should receive priority inclusion. It has been around

Re: Proposal to Include GEODE-7079 in 1.10.0

2019-08-15 Thread Udo Kohlmeyer
Whilst I agree with "*finish* when we believe the quality of the release branch is sufficient", I disagree that we should have cut a branch and continue to patch that branch with non-critical fixes. i.e this issue has been around for a while and has no averse side effects. Issues like

Re: Proposal to Include GEODE-7079 in 1.10.0

2019-08-15 Thread Dan Smith
+1 to merging Juan's fix for GEODE-7079. I've seen systems taken down by rapidly filling up the logs in the past, this does seem to be a critical fix from the perspective of system stability. Also, this is the change, which doesn't seem particularly risky to me. - ConflationKey key =

Re: Proposal to Include GEODE-7079 in 1.10.0

2019-08-15 Thread Udo Kohlmeyer
I'm changing my vote to +1 on this issue. The ONLY reason I'm changing my vote is to add to the cleanliness of the code of the release. I do 100% disagree with the continual scope creep that we have been incurring on this release branch. --Udo On 8/15/19 12:34 PM, Dan Smith wrote: +1 to

Re: Propose fix for 1.10 release: Prevent NPE in getLocalSize()

2019-08-15 Thread Aaron Lindsey
It sounds like there is consensus on adding this fix. Could someone please cherry-pick this for me? Thanks, Aaron > On Aug 14, 2019, at 1:13 PM, Udo Kohlmeyer wrote: > > @Aaron,Kirk - thank you for the clarification. > > +1 to include the fix, as reverting GEODE-7001 would be more effort :)

Re: New build warnings

2019-08-15 Thread Jacob Barrett
On that note, I’ve had a PR open to address all the API warnings for some time now. Would love a review. https://github.com/apache/geode/pull/3872 > On Aug 15, 2019, at 3:59 PM, Kirk Lund wrote: > > Just a reminder, that our many sun.misc.* warnings are drowning out real > warnings... > > We