Re: [Proposal] Change default gemfire.memoryEventTolerance from 1 to 0

2019-01-24 Thread Alexander Murmann
Great points, Ryan. I agree, if this is a bug fix rather than a
feature/improvement we should fix it.

On Thu, Jan 24, 2019 at 9:58 AM Ryan McMahon  wrote:

> Hi Alexander,
>
> Thanks for bring up those concerns.  I think in this case the default of 1
> is potentially masking behavior that the user would definitely want to see
> and react to (such as heap memory read outliers/spikes), so it is a bit
> different than other cases where changing the default would be a
> nice-to-have in 2.0.  The user can still adjust their parameter to deal
> with any bad reads they are observing, but it would be best if they did
> that explicitly instead of us hiding the behavior by default silently.  So
> I would argue it is a fair change in this case in a minor release, as it
> could be viewed as a "bug fix" i.e. masking the bad reads by default.
>
> Ryan
>
> On Thu, Jan 24, 2019 at 9:49 AM David Wisler  wrote:
>
> > I talked to him.  He is fine with our new change.
> >
> > :-)He just wanted to have it on the table.
> >
> > So you missed out on this 1 minute discussion.  haha
> >
> > On Thu, Jan 24, 2019 at 9:38 AM Ryan McMahon 
> wrote:
> >
> >> Hey David,
> >>
> >> Alexander responded to the dev list, but not sure you got a copy.
> >> Basically he is asking if we can wait to change the default tolerance
> until
> >> Geode 2.0, since it is a behavior change and apparently there are
> several
> >> config parameters for which we'd like to change the default, but we
> pushed
> >> it to 2.0.  I don't feel super strongly either way.  Do you want to
> reply
> >> to this thread or talk to Alexander directly?
> >>
> >> Ryan
> >>
> >> -- Forwarded message -
> >> From: Alexander Murmann 
> >> Date: Wed, Jan 23, 2019 at 4:13 PM
> >> Subject: Re: [Proposal] Change default gemfire.memoryEventTolerance from
> >> 1 to 0
> >> To: 
> >> Cc: Ryan McMahon 
> >>
> >>
> >> Ryan, thank you so much for the great explanation of your proposal!
> >>
> >> This seems very sound and you and David got me convinced that it's the
> >> right thing to change the default. To me the question now is one of
> >> timing.
> >> Is this something we can change in a minor release or do we have to wait
> >> for Geode 2.0? I think we have quite a few defaults we'd like to update.
> >> However, a user might have a system in prod that relies on defaults
> being
> >> a
> >> certain way and upgrading to the next minor shouldn't require any work
> on
> >> their end to prevent any negative impact on their system.
> >>
> >> Thoughts?
> >>
> >> On Tue, Jan 22, 2019 at 12:33 PM David Wisler 
> wrote:
> >>
> >> > I would add that, by changing the default to 0, we can then skip all
> of
> >> the
> >> > "special" logic that almost no customers use.With a default of 1,
> >> we go
> >> > into this logic every time unnecessarily, even when customers have not
> >> > explicitly told us to "tolerate" an eviction or critical state change.
> >>   I
> >> > am in favor of this default change to 0, and also add that there are
> no
> >> > customers who would even realize such a change in behavior has
> occurred.
> >> > I would also suggest that tolerating 1 critical reading, delaying the
> >> > subsequent behaviors in GemFire when above critical, could make us
> more
> >> > vulnerable to OOME's than would be the case by immediately
> transitioning
> >> > state.
> >> >
> >> > My 2 cents.  Thanks for the email Ryan.
> >> >
> >> >
> >> >
> >> > On Tue, Jan 22, 2019 at 10:22 AM Ryan McMahon 
> >> wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > > I am currently fixing a bug
> >> > > <https://issues.apache.org/jira/browse/GEODE-6304> with the
> >> > > HeapMemoryMonitor event tolerance feature, and came across a
> decision
> >> > that
> >> > > I thought would be more appropriate for the Geode dev list.
> >> > >
> >> > > For those familiar with the feature, we are proposing that the
> default
> >> > > gemfire.memoryEventTolerance config parameter value is changed from
> 1
> >> to
> >> > 0
> >> > > so state transitions from normal to eviction or critical o

Re: [Proposal] Change default gemfire.memoryEventTolerance from 1 to 0

2019-01-24 Thread Ryan McMahon
Hi Alexander,

Thanks for bring up those concerns.  I think in this case the default of 1
is potentially masking behavior that the user would definitely want to see
and react to (such as heap memory read outliers/spikes), so it is a bit
different than other cases where changing the default would be a
nice-to-have in 2.0.  The user can still adjust their parameter to deal
with any bad reads they are observing, but it would be best if they did
that explicitly instead of us hiding the behavior by default silently.  So
I would argue it is a fair change in this case in a minor release, as it
could be viewed as a "bug fix" i.e. masking the bad reads by default.

Ryan

On Thu, Jan 24, 2019 at 9:49 AM David Wisler  wrote:

> I talked to him.  He is fine with our new change.
>
> :-)He just wanted to have it on the table.
>
> So you missed out on this 1 minute discussion.  haha
>
> On Thu, Jan 24, 2019 at 9:38 AM Ryan McMahon  wrote:
>
>> Hey David,
>>
>> Alexander responded to the dev list, but not sure you got a copy.
>> Basically he is asking if we can wait to change the default tolerance until
>> Geode 2.0, since it is a behavior change and apparently there are several
>> config parameters for which we'd like to change the default, but we pushed
>> it to 2.0.  I don't feel super strongly either way.  Do you want to reply
>> to this thread or talk to Alexander directly?
>>
>> Ryan
>>
>> -- Forwarded message -----
>> From: Alexander Murmann 
>> Date: Wed, Jan 23, 2019 at 4:13 PM
>> Subject: Re: [Proposal] Change default gemfire.memoryEventTolerance from
>> 1 to 0
>> To: 
>> Cc: Ryan McMahon 
>>
>>
>> Ryan, thank you so much for the great explanation of your proposal!
>>
>> This seems very sound and you and David got me convinced that it's the
>> right thing to change the default. To me the question now is one of
>> timing.
>> Is this something we can change in a minor release or do we have to wait
>> for Geode 2.0? I think we have quite a few defaults we'd like to update.
>> However, a user might have a system in prod that relies on defaults being
>> a
>> certain way and upgrading to the next minor shouldn't require any work on
>> their end to prevent any negative impact on their system.
>>
>> Thoughts?
>>
>> On Tue, Jan 22, 2019 at 12:33 PM David Wisler  wrote:
>>
>> > I would add that, by changing the default to 0, we can then skip all of
>> the
>> > "special" logic that almost no customers use.With a default of 1,
>> we go
>> > into this logic every time unnecessarily, even when customers have not
>> > explicitly told us to "tolerate" an eviction or critical state change.
>>   I
>> > am in favor of this default change to 0, and also add that there are no
>> > customers who would even realize such a change in behavior has occurred.
>> > I would also suggest that tolerating 1 critical reading, delaying the
>> > subsequent behaviors in GemFire when above critical, could make us more
>> > vulnerable to OOME's than would be the case by immediately transitioning
>> > state.
>> >
>> > My 2 cents.  Thanks for the email Ryan.
>> >
>> >
>> >
>> > On Tue, Jan 22, 2019 at 10:22 AM Ryan McMahon 
>> wrote:
>> >
>> > > Hi all,
>> > >
>> > > I am currently fixing a bug
>> > > <https://issues.apache.org/jira/browse/GEODE-6304> with the
>> > > HeapMemoryMonitor event tolerance feature, and came across a decision
>> > that
>> > > I thought would be more appropriate for the Geode dev list.
>> > >
>> > > For those familiar with the feature, we are proposing that the default
>> > > gemfire.memoryEventTolerance config parameter value is changed from 1
>> to
>> > 0
>> > > so state transitions from normal to eviction or critical occur
>> > immediately
>> > > after reading a single heap-used-bytes event above threshold.  If you
>> are
>> > > unfamiliar with the feature, read on.
>> > >
>> > > The memory event tolerance feature addresses issues with some JVM
>> distros
>> > > that result in sporadic, erroneously high heap-bytes-used readings.
>> The
>> > > feature was introduced to address this issue in the JRockit JVM, but
>> it
>> > has
>> > > been found that other JVM distros are susceptible to this problem as
>> > well.
>> > >
>> > > The feature prevents an "unexpect

Re: [Proposal] Change default gemfire.memoryEventTolerance from 1 to 0

2019-01-23 Thread Alexander Murmann
Ryan, thank you so much for the great explanation of your proposal!

This seems very sound and you and David got me convinced that it's the
right thing to change the default. To me the question now is one of timing.
Is this something we can change in a minor release or do we have to wait
for Geode 2.0? I think we have quite a few defaults we'd like to update.
However, a user might have a system in prod that relies on defaults being a
certain way and upgrading to the next minor shouldn't require any work on
their end to prevent any negative impact on their system.

Thoughts?

On Tue, Jan 22, 2019 at 12:33 PM David Wisler  wrote:

> I would add that, by changing the default to 0, we can then skip all of the
> "special" logic that almost no customers use.With a default of 1, we go
> into this logic every time unnecessarily, even when customers have not
> explicitly told us to "tolerate" an eviction or critical state change.I
> am in favor of this default change to 0, and also add that there are no
> customers who would even realize such a change in behavior has occurred.
> I would also suggest that tolerating 1 critical reading, delaying the
> subsequent behaviors in GemFire when above critical, could make us more
> vulnerable to OOME's than would be the case by immediately transitioning
> state.
>
> My 2 cents.  Thanks for the email Ryan.
>
>
>
> On Tue, Jan 22, 2019 at 10:22 AM Ryan McMahon  wrote:
>
> > Hi all,
> >
> > I am currently fixing a bug
> >  with the
> > HeapMemoryMonitor event tolerance feature, and came across a decision
> that
> > I thought would be more appropriate for the Geode dev list.
> >
> > For those familiar with the feature, we are proposing that the default
> > gemfire.memoryEventTolerance config parameter value is changed from 1 to
> 0
> > so state transitions from normal to eviction or critical occur
> immediately
> > after reading a single heap-used-bytes event above threshold.  If you are
> > unfamiliar with the feature, read on.
> >
> > The memory event tolerance feature addresses issues with some JVM distros
> > that result in sporadic, erroneously high heap-bytes-used readings.  The
> > feature was introduced to address this issue in the JRockit JVM, but it
> has
> > been found that other JVM distros are susceptible to this problem as
> well.
> >
> > The feature prevents an "unexpected" state transition from a normal state
> > to an eviction or critical state by requiring N (configurable)
> consecutive
> > heap-used-byte events above threshold before changing states.  The
> current
> > default configuration is N = 5 for JRockit and N = 1 for all other JVMs.
> > In a non-JRockit JVM, this configuration permits a single event above
> > threshold WITHOUT causing a state transition.  In other words, by
> default,
> > we allow for a single bad outlier heap-used-bytes reading without going
> > into an eviction or critical state.
> >
> > As part of this bug fix (which involves a failure to reset the tolerance
> > counter under some conditions), we opted to remove the special handling
> for
> > JRockit because JRockit is no longer supported.  After removing the
> JRockit
> > handling, we started re-evaluating if a default value of 1 is appropriate
> > for all other JVMs.  We are considering changing the default to 0, so
> state
> > transitions would occur immediately if an event above the threshold is
> > received.  If a user is facing one of these problematic JVMs, they can
> then
> > change the gemfire.memoryEventTolerance config parameter to increase the
> > tolerance.  Our concern is that the default today is potentially masking
> > bad heap readings without the user ever knowing.
> >
> > To summarize, if we change the default from 1 to 0 it would potentially
> be
> > a change in behavior in that we would no longer be masking a single bad
> > heap-used-bytes reading i.e. no longer permitting a single outlier
> without
> > changing states.  The user can then decide whether to configure a
> non-zero
> > tolerance to address the situation.  Any thoughts on this change in
> > behavior?
> >
> > Thanks,
> > Ryan
> >
> >
> >
> >
> >
> >
> >
> >
>
> --
>
> David Wisler  |  GemFire Support Product Manager  |  503-810-7840 cell
> Support.Pivotal.io
> <
> http://www.google.com/url?q=http%3A%2F%2Fsupport.pivotal.io%2F=D=1=AFQjCNGDBr_XSKC18wot5h3OkKoZ84Vn7Q
> >
>   |  Mon-Fri  8:00am to 5:00pm PST  |  1-877-477-2269
> [image: support]
> <
> https://www.google.com/url?q=https%3A%2F%2Fsupport.pivotal.io%2F=D=1=AFQjCNEvwKLjzu29inKwy4jJjKsboqGMCg
> >
>  [image: twitter]
> <
> https://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Fpivotal=D=1=AFQjCNG1FcqkH5ghKsSG6UkdeUzjSuDSHg
> >
>  [image: linkedin]
> <
> https://www.google.com/url?q=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2F3048967=D=1=AFQjCNHOQGYmDYIQz06S3-vAuqzf8bN8Yw
> >
>  [image: facebook]
> <
> 

Re: [Proposal] Change default gemfire.memoryEventTolerance from 1 to 0

2019-01-22 Thread David Wisler
I would add that, by changing the default to 0, we can then skip all of the
"special" logic that almost no customers use.With a default of 1, we go
into this logic every time unnecessarily, even when customers have not
explicitly told us to "tolerate" an eviction or critical state change.I
am in favor of this default change to 0, and also add that there are no
customers who would even realize such a change in behavior has occurred.
I would also suggest that tolerating 1 critical reading, delaying the
subsequent behaviors in GemFire when above critical, could make us more
vulnerable to OOME's than would be the case by immediately transitioning
state.

My 2 cents.  Thanks for the email Ryan.



On Tue, Jan 22, 2019 at 10:22 AM Ryan McMahon  wrote:

> Hi all,
>
> I am currently fixing a bug
>  with the
> HeapMemoryMonitor event tolerance feature, and came across a decision that
> I thought would be more appropriate for the Geode dev list.
>
> For those familiar with the feature, we are proposing that the default
> gemfire.memoryEventTolerance config parameter value is changed from 1 to 0
> so state transitions from normal to eviction or critical occur immediately
> after reading a single heap-used-bytes event above threshold.  If you are
> unfamiliar with the feature, read on.
>
> The memory event tolerance feature addresses issues with some JVM distros
> that result in sporadic, erroneously high heap-bytes-used readings.  The
> feature was introduced to address this issue in the JRockit JVM, but it has
> been found that other JVM distros are susceptible to this problem as well.
>
> The feature prevents an "unexpected" state transition from a normal state
> to an eviction or critical state by requiring N (configurable) consecutive
> heap-used-byte events above threshold before changing states.  The current
> default configuration is N = 5 for JRockit and N = 1 for all other JVMs.
> In a non-JRockit JVM, this configuration permits a single event above
> threshold WITHOUT causing a state transition.  In other words, by default,
> we allow for a single bad outlier heap-used-bytes reading without going
> into an eviction or critical state.
>
> As part of this bug fix (which involves a failure to reset the tolerance
> counter under some conditions), we opted to remove the special handling for
> JRockit because JRockit is no longer supported.  After removing the JRockit
> handling, we started re-evaluating if a default value of 1 is appropriate
> for all other JVMs.  We are considering changing the default to 0, so state
> transitions would occur immediately if an event above the threshold is
> received.  If a user is facing one of these problematic JVMs, they can then
> change the gemfire.memoryEventTolerance config parameter to increase the
> tolerance.  Our concern is that the default today is potentially masking
> bad heap readings without the user ever knowing.
>
> To summarize, if we change the default from 1 to 0 it would potentially be
> a change in behavior in that we would no longer be masking a single bad
> heap-used-bytes reading i.e. no longer permitting a single outlier without
> changing states.  The user can then decide whether to configure a non-zero
> tolerance to address the situation.  Any thoughts on this change in
> behavior?
>
> Thanks,
> Ryan
>
>
>
>
>
>
>
>

-- 

David Wisler  |  GemFire Support Product Manager  |  503-810-7840 cell
Support.Pivotal.io

  |  Mon-Fri  8:00am to 5:00pm PST  |  1-877-477-2269
[image: support]

 [image: twitter]

 [image: linkedin]

 [image: facebook]

 [image: google plus]  [image: youtube]



[Proposal] Change default gemfire.memoryEventTolerance from 1 to 0

2019-01-22 Thread Ryan McMahon
Hi all,

I am currently fixing a bug
 with the
HeapMemoryMonitor event tolerance feature, and came across a decision that
I thought would be more appropriate for the Geode dev list.

For those familiar with the feature, we are proposing that the default
gemfire.memoryEventTolerance config parameter value is changed from 1 to 0
so state transitions from normal to eviction or critical occur immediately
after reading a single heap-used-bytes event above threshold.  If you are
unfamiliar with the feature, read on.

The memory event tolerance feature addresses issues with some JVM distros
that result in sporadic, erroneously high heap-bytes-used readings.  The
feature was introduced to address this issue in the JRockit JVM, but it has
been found that other JVM distros are susceptible to this problem as well.

The feature prevents an "unexpected" state transition from a normal state
to an eviction or critical state by requiring N (configurable) consecutive
heap-used-byte events above threshold before changing states.  The current
default configuration is N = 5 for JRockit and N = 1 for all other JVMs.
In a non-JRockit JVM, this configuration permits a single event above
threshold WITHOUT causing a state transition.  In other words, by default,
we allow for a single bad outlier heap-used-bytes reading without going
into an eviction or critical state.

As part of this bug fix (which involves a failure to reset the tolerance
counter under some conditions), we opted to remove the special handling for
JRockit because JRockit is no longer supported.  After removing the JRockit
handling, we started re-evaluating if a default value of 1 is appropriate
for all other JVMs.  We are considering changing the default to 0, so state
transitions would occur immediately if an event above the threshold is
received.  If a user is facing one of these problematic JVMs, they can then
change the gemfire.memoryEventTolerance config parameter to increase the
tolerance.  Our concern is that the default today is potentially masking
bad heap readings without the user ever knowing.

To summarize, if we change the default from 1 to 0 it would potentially be
a change in behavior in that we would no longer be masking a single bad
heap-used-bytes reading i.e. no longer permitting a single outlier without
changing states.  The user can then decide whether to configure a non-zero
tolerance to address the situation.  Any thoughts on this change in
behavior?

Thanks,
Ryan