Re: Proposal: Add GEODE-8950 (performance degradation in P2pPartitionedPutLongBenchmark) to 1.14 blockers list

2021-03-12 Thread Donal Evans
That 4.6% degradation is within our thresholds so it is possible this came in 
well before the first time it was detected.
After doing some bisecting, I've found that Geode SHA 986733ec (committed on 
Nov 20th 2020) shows an average degradation of ~1% compared to the 1.13 
baseline, whereas e26d7595 (committed on Dec 3rd) shows an average degradation 
of ~5%, so there's definitely been a performance degradation introduced 
somewhere between those two commits. The fact that these numbers are coming 
from an average of 10 runs is relevant too, since part of the reason we have a 
threshold is because we know that there is some noise associated with the test. 
An individual degradation of 5% is nothing to worry about, but a consistent 
average degradation of the same amount is much more serious. I'm continuing 
work to bisect to an individual SHA and hope to get it pinned down eventually 
(I currently have a range of about 20 that it could be), but it's slow going as 
I have to run the benchmark multiple times.

From: Jacob Barrett 
Sent: Friday, March 12, 2021 1:55 PM
To: dev@geode.apache.org 
Subject: Re: Proposal: Add GEODE-8950 (performance degradation in 
P2pPartitionedPutLongBenchmark) to 1.14 blockers list

That 4.6% degradation is within our thresholds so it is possible this came in 
well before the first time it was detected.

Some context on the benchmark for those unfamiliar: 
P2pPartitionedPutLongBenchmark performs a fire hose of puts with Long key and 
Long value directly from each of the two servers. This results in a lot of 
little p2p messages being exchanged between the servers. Presumably 50% of the 
puts result in forward to the primary bucket plus the replication message for 
every put. This test can be susceptible to the smallest alteration in garbage 
production/collection, hot methods, locks, etc.

This test is a very unlikely scenario in production. I am not sure that 
constitutes a blocking condition but is troubling. I will give a neutral vote 
on making it a blocker.

> On Mar 12, 2021, at 11:19 AM, Donal Evans  wrote:
>
> After some investigation, it appears that a degradation has been introduced 
> that causes the P2pPartitionedPutLongBenchmark to fail at an increased rate. 
> Efforts are underway to narrow down the cause to a single commit, but the 
> degradation was definitely introduced in 1.14, so I believe this should be 
> considered a 1.14 release blocker: 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8950&data=04%7C01%7Cdoevans%40vmware.com%7C0f5c6ee03ce8422a03d108d8e5a18490%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C637511829192332012%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hEKlAAtNvguEdp1Ey6vVxzMV%2FuR76jIvcqwt2CAa6JU%3D&reserved=0



Re: Proposal: Add GEODE-8950 (performance degradation in P2pPartitionedPutLongBenchmark) to 1.14 blockers list

2021-03-12 Thread Jacob Barrett
That 4.6% degradation is within our thresholds so it is possible this came in 
well before the first time it was detected.

Some context on the benchmark for those unfamiliar: 
P2pPartitionedPutLongBenchmark performs a fire hose of puts with Long key and 
Long value directly from each of the two servers. This results in a lot of 
little p2p messages being exchanged between the servers. Presumably 50% of the 
puts result in forward to the primary bucket plus the replication message for 
every put. This test can be susceptible to the smallest alteration in garbage 
production/collection, hot methods, locks, etc. 

This test is a very unlikely scenario in production. I am not sure that 
constitutes a blocking condition but is troubling. I will give a neutral vote 
on making it a blocker.

> On Mar 12, 2021, at 11:19 AM, Donal Evans  wrote:
> 
> After some investigation, it appears that a degradation has been introduced 
> that causes the P2pPartitionedPutLongBenchmark to fail at an increased rate. 
> Efforts are underway to narrow down the cause to a single commit, but the 
> degradation was definitely introduced in 1.14, so I believe this should be 
> considered a 1.14 release blocker: 
> https://issues.apache.org/jira/browse/GEODE-8950



Re: Proposal: Add GEODE-8950 (performance degradation in P2pPartitionedPutLongBenchmark) to 1.14 blockers list

2021-03-12 Thread Owen Nichols
Thanks for spotting this and looking into it, let's keep it on the blocker list 
and get a better understanding.

On 3/12/21, 11:21 AM, "Nabarun Nag"  wrote:

+1

From: Donal Evans 
Sent: Friday, March 12, 2021 11:19 AM
To: dev@geode.apache.org 
Subject: Proposal: Add GEODE-8950 (performance degradation in 
P2pPartitionedPutLongBenchmark) to 1.14 blockers list

After some investigation, it appears that a degradation has been introduced 
that causes the P2pPartitionedPutLongBenchmark to fail at an increased rate. 
Efforts are underway to narrow down the cause to a single commit, but the 
degradation was definitely introduced in 1.14, so I believe this should be 
considered a 1.14 release blocker: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8950&data=04%7C01%7Conichols%40vmware.com%7Cc564f9c25ae3426d970e08d8e58c05f1%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C637511736847524655%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Bw8wvXC8WPsKTTBE4iOBW5yhXhZ%2FjfMUSeX0%2BUPrDtA%3D&reserved=0



Re: Proposal: Add GEODE-8950 (performance degradation in P2pPartitionedPutLongBenchmark) to 1.14 blockers list

2021-03-12 Thread Nabarun Nag
+1

From: Donal Evans 
Sent: Friday, March 12, 2021 11:19 AM
To: dev@geode.apache.org 
Subject: Proposal: Add GEODE-8950 (performance degradation in 
P2pPartitionedPutLongBenchmark) to 1.14 blockers list

After some investigation, it appears that a degradation has been introduced 
that causes the P2pPartitionedPutLongBenchmark to fail at an increased rate. 
Efforts are underway to narrow down the cause to a single commit, but the 
degradation was definitely introduced in 1.14, so I believe this should be 
considered a 1.14 release blocker: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8950&data=04%7C01%7Cnnag%40vmware.com%7C725b916c911e47e0561408d8e58bc355%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C637511735732610115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zWmQr%2FUvmR9bnE%2BeOyL0sHTz03qYW3Hd3aydKabTbyA%3D&reserved=0


Proposal: Add GEODE-8950 (performance degradation in P2pPartitionedPutLongBenchmark) to 1.14 blockers list

2021-03-12 Thread Donal Evans
After some investigation, it appears that a degradation has been introduced 
that causes the P2pPartitionedPutLongBenchmark to fail at an increased rate. 
Efforts are underway to narrow down the cause to a single commit, but the 
degradation was definitely introduced in 1.14, so I believe this should be 
considered a 1.14 release blocker: 
https://issues.apache.org/jira/browse/GEODE-8950