Re: Update on CI migration

2024-02-11 Thread Kirk Lund
Hi Sai,

I'd like to help out on this or take it over. If you have time, please loop
me in with any contact info for the infra team, details about what is
currently disabled, details about what is currently blocked (just running
Windows tests?), and I'll try reaching out to them again for some help.

Thanks,
Kirk

On Wed, Jul 19, 2023 at 3:08 PM Mark Bretl  wrote:

> Thanks for the update Sai. Your work is much appreciated!
>
> --Mark
>
> On Wed, Jul 12, 2023 at 9:02 PM Sai Boorlagadda  >
> wrote:
>
> > I found out that the infra team is working actively on providing larger
> > instances for jobs in Github actions. I will resume my work as soon I see
> > they have an option.
> >
> > Other than that I couldn't keep up contributing to this project in the
> last
> > few weeks.
> >
> > Sai
> >
>


spotlessGroovyGradleCheck failures in clean checkout on windows

2023-05-11 Thread Kirk Lund
Anyone know why a fresh checkout of develop has all of the following
failures on Windows?

> ./gradlew clean
> ./gradlew build -x test

> Task :geode-cq:spotlessGroovyGradleCheck FAILED
> Task :geode-server-all:spotlessGroovyGradleCheck FAILED
> Task :geode-membership:spotlessGroovyGradleCheck FAILED
> Task :geode-assembly:geode-assembly-test:spotlessGroovyGradleCheck FAILED
> Task :extensions:geode-modules:spotlessGroovyGradleCheck FAILED


> Task :geode-logging:spotlessGroovyGradleCheck FAILED
> Task :extensions:geode-modules-test:spotlessGroovyGradleCheck FAILED
> Task :geode-tcp-server:spotlessGroovyGradleCheck FAILED
> Task :geode-connectors:spotlessGroovyGradleCheck FAILED
> Task :geode-management:spotlessGroovyGradleCheck FAILED
> Task :geode-rebalancer:spotlessGroovyGradleCheck FAILED


> Task :geode-lucene:spotlessGroovyGradleCheck FAILED
> Task :geode-unsafe:spotlessGroovyGradleCheck FAILED
> Task :geode-serialization:spotlessGroovyGradleCheck FAILED
> Task :extensions:session-testing-war:spotlessGroovyGradleCheck FAILED
> Task :geode-common:spotlessGroovyGradleCheck FAILED
> Task :geode-dunit:spotlessGroovyGradleCheck FAILED


> Task :geode-gfsh:spotlessGroovyGradleCheck FAILED
> Task :geode-junit:spotlessGroovyGradleCheck FAILED


> Task :geode-web-management:spotlessGroovyGradleCheck FAILED
> Task :geode-log4j:spotlessGroovyGradleCheck FAILED
> Task :geode-pulse:spotlessGroovyGradleCheck FAILED
> Task :geode-core:spotlessGroovyGradleCheck FAILED
> Task :geode-assembly:spotlessGroovyGradleCheck FAILED


> Task :geode-http-service:spotlessGroovyGradleCheck FAILED


> Task :geode-deployment:geode-deployment-legacy:spotlessGroovyGradleCheck
FAILED

> Task :geode-old-client-support:spotlessGroovyGradleCheck FAILED


> Task :geode-memcached:spotlessGroovyGradleCheck FAILED
> Task :extensions:geode-modules-session-internal:spotlessGroovyGradleCheck
FAILED

> Task :geode-wan:spotlessGroovyGradleCheck FAILED
> Task :extensions:geode-modules-session:spotlessGroovyGradleCheck FAILED


> Task :extensions:geode-modules-tomcat7:spotlessGroovyGradleCheck FAILED


> Task :extensions:geode-modules-tomcat8:spotlessGroovyGradleCheck FAILED


> Task :extensions:geode-modules-tomcat9:spotlessGroovyGradleCheck FAILED


> Task :geode-concurrency-test:spotlessGroovyGradleCheck FAILED


> Task :geode-web-api:spotlessGroovyGradleCheck FAILED
> Task :geode-lucene:geode-lucene-test:spotlessGroovyGradleCheck FAILED


> Task :geode-jmh:spotlessGroovyGradleCheck FAILED
> Task :static-analysis:spotlessGroovyGradleCheck FAILED


> Task :geode-pulse:geode-pulse-test:spotlessGroovyGradleCheck FAILED



> Task :boms:geode-client-bom:spotlessGroovyGradle
Errors occurred while build effective model from
C:\Users\kirkl\.gradle\caches\modules-2\files-2.1\org.eclipse.platform\org.eclipse.core.expressions\3.8.200\eb159f34083b0135459745f934a6ad5eb61b61c\org.eclipse.core.expressions-3.8.200.pom:


'modelVersion' must be one of [4.0.0] but is '4.0'. in
org.eclipse.platform:org.eclipse.core.expressions:3.8.200

Errors occurred while build effective model from
C:\Users\kirkl\.gradle\caches\modules-2\files-2.1\org.eclipse.platform\org.eclipse.core.filesystem\1.9.500\a170901fc0ef24de65515b0a96dc02fb6175\org.eclipse.core.filesystem-1.9.500.pom:
'modelVersion' must be one of [4.0.0] but is '4.0'. in
org.eclipse.platform:org.eclipse.core.filesystem:1.9.500

Errors occurred while build effective model from
C:\Users\kirkl\.gradle\caches\modules-2\files-2.1\org.eclipse.platform\org.eclipse.swt\3.123.0\9e43f16c2ff40b4e45195d447e9f764853356158\org.eclipse.swt-3.123.0.pom:
'dependencies.dependency.artifactId' for
org.eclipse.platform:org.eclipse.swt.${osgi.platform}:jar with value
'org.eclipse.swt.${osgi.platform}' does not match a valid id pattern. in
org.eclipse.platform:org.eclipse.swt:3.123.0



> Task :boms:geode-client-bom:spotlessGroovyGradleCheck FAILED


> Task :extensions:geode-modules-assembly:spotlessGroovyGradleCheck FAILED


> Task :static-analysis:pmd-rules:spotlessGroovyGradleCheck FAILED


> Task :geode-web:spotlessGroovyGradleCheck FAILED


Re: Update on DUnit test pipeline on Github actions

2023-04-29 Thread Kirk Lund
Did INFRA give any hint as to when they might provide the bigger VMs?

On Mon, Apr 17, 2023 at 8:24 PM Sai Boorlagadda 
wrote:

> I went ahead and merged github workflow jobs that tests
> WAN, CQ, Assembly and Managment distributed tests.
>
> Free workers VMs has 2 cores and tuning any sort of
> parameters isn't speeding up geode-core DUnits.
>
> Talking to infra team found that infra is working on providing
> self-hosted (sponsored by infra) that are much bigger VMs.
>
> So until such VMs are available I am going to find if there are any
> alternate solution.
>
> On Thu, 13 Apr 2023 at 15:08, Kirk Lund  wrote:
>
> > I see that there is at least one person concerned about DUnit tests
> > requiring longer timeouts. This is the current situation with an unknown
> > number of the DUnit tests. One possibility is to move the worst offenders
> > to a new src set within geode-core and then give that its own job with a
> > larger timeout. The longer term solution is to fix or even rewrite some
> of
> > those tests. Excluding them is also not a viable option as we risk
> > losing important test coverage that way. I agree that some of these tests
> > need a lot more help than tweaking overall job timeout values, but
> without
> > a lot more time commitment from contributors that might not be an
> > option for some time.
> >
> > -Kirk
> >
> > On Thu, Apr 13, 2023 at 3:02 PM Kirk Lund  wrote:
> >
> > > Is the coreDistributedTests the only dunit job that currently takes too
> > > long? If it is we may want to split that into more than one job.
> > >
> > > -Kirk
> > >
> > > On Wed, Apr 12, 2023 at 7:58 PM Sai Boorlagadda <
> > sai.boorlaga...@gmail.com>
> > > wrote:
> > >
> > >> All,
> > >>
> > >> There is an upper bound for job execution time on free workers (set
> to 6
> > >> hours max[1]), which can be configured beyond 6hrs with a self-hosted
> > >> worker. All of our pipeline jobs are using `--max-workers` to
> > parallelize
> > >> gradle tasks but `testMaxParallelForks` is left to default which is
> > (1/4th
> > >> of the available CPU cores), so primarily due to running only a single
> > >> test
> > >> in each parallel fork geode-core distribution tests are taking more
> > than 6
> > >> hours. Other than finding a solution for core distributed tests, most
> > >> DUnit
> > >> tests are passed[2] by splitting them into individual jobs (WAN, CQ,
> > >> Lucene, assembly, management).
> > >>
> > >> Will reach out to infra team and trying playing with `--max-workers`
> to
> > >> parallelize more tests than having to run parallel tests with in a
> fork
> > >> would be options.
> > >>
> > >> I am going to wait for few days to get answers from infra team before
> I
> > >> can
> > >> create a PR to add at least the passing DUnits.
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration
> > >> [2] https://github.com/apache/geode/actions/runs/4639012912
> > >>
> > >> Sai
> > >>
> > >
> >
>


Re: Update on DUnit test pipeline on Github actions

2023-04-13 Thread Kirk Lund
I see that there is at least one person concerned about DUnit tests
requiring longer timeouts. This is the current situation with an unknown
number of the DUnit tests. One possibility is to move the worst offenders
to a new src set within geode-core and then give that its own job with a
larger timeout. The longer term solution is to fix or even rewrite some of
those tests. Excluding them is also not a viable option as we risk
losing important test coverage that way. I agree that some of these tests
need a lot more help than tweaking overall job timeout values, but without
a lot more time commitment from contributors that might not be an
option for some time.

-Kirk

On Thu, Apr 13, 2023 at 3:02 PM Kirk Lund  wrote:

> Is the coreDistributedTests the only dunit job that currently takes too
> long? If it is we may want to split that into more than one job.
>
> -Kirk
>
> On Wed, Apr 12, 2023 at 7:58 PM Sai Boorlagadda 
> wrote:
>
>> All,
>>
>> There is an upper bound for job execution time on free workers (set to 6
>> hours max[1]), which can be configured beyond 6hrs with a self-hosted
>> worker. All of our pipeline jobs are using `--max-workers` to parallelize
>> gradle tasks but `testMaxParallelForks` is left to default which is (1/4th
>> of the available CPU cores), so primarily due to running only a single
>> test
>> in each parallel fork geode-core distribution tests are taking more than 6
>> hours. Other than finding a solution for core distributed tests, most
>> DUnit
>> tests are passed[2] by splitting them into individual jobs (WAN, CQ,
>> Lucene, assembly, management).
>>
>> Will reach out to infra team and trying playing with `--max-workers` to
>> parallelize more tests than having to run parallel tests with in a fork
>> would be options.
>>
>> I am going to wait for few days to get answers from infra team before I
>> can
>> create a PR to add at least the passing DUnits.
>>
>> [1]
>>
>> https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration
>> [2] https://github.com/apache/geode/actions/runs/4639012912
>>
>> Sai
>>
>


Re: Update on DUnit test pipeline on Github actions

2023-04-13 Thread Kirk Lund
Is the coreDistributedTests the only dunit job that currently takes too
long? If it is we may want to split that into more than one job.

-Kirk

On Wed, Apr 12, 2023 at 7:58 PM Sai Boorlagadda 
wrote:

> All,
>
> There is an upper bound for job execution time on free workers (set to 6
> hours max[1]), which can be configured beyond 6hrs with a self-hosted
> worker. All of our pipeline jobs are using `--max-workers` to parallelize
> gradle tasks but `testMaxParallelForks` is left to default which is (1/4th
> of the available CPU cores), so primarily due to running only a single test
> in each parallel fork geode-core distribution tests are taking more than 6
> hours. Other than finding a solution for core distributed tests, most DUnit
> tests are passed[2] by splitting them into individual jobs (WAN, CQ,
> Lucene, assembly, management).
>
> Will reach out to infra team and trying playing with `--max-workers` to
> parallelize more tests than having to run parallel tests with in a fork
> would be options.
>
> I am going to wait for few days to get answers from infra team before I can
> create a PR to add at least the passing DUnits.
>
> [1]
>
> https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration
> [2] https://github.com/apache/geode/actions/runs/4639012912
>
> Sai
>


Approved your PR Sai

2023-02-13 Thread Kirk Lund
Sai,

I approved the only non-draft PR you had waiting for review. Feel free to
ping slack or email me if you have something waiting on me.

Thanks,
Kirk


Re: isolated test plugin

2023-02-13 Thread Kirk Lund
Let's have a longer chat about what I know in slack. I think a lot of this
was done for running tests in parallel.

-Kirk

On Mon, Jan 23, 2023 at 9:50 PM Sai Boorlagadda 
wrote:

> Could someone provide context on how integration & dunit tests are run in
> CI?
>
> I remember geode had dockerized test plugin that isolates test execution.
> But looking at this PR[1] it seems we removed usage of that plugin and have
> a customized mechanism to allocate ports so that we don't need to run tests
> in docker containers.
>
> But the CI source (execute_tests.sh) seems to still pass docker
> image -PdunitDockerUser=geode -PdunitDockerImage=\$(docker images. So do we
> need to run these tests in docker containers?
>
> 1. https://github.com/apache/geode/pull/6720
>
> Sai
>


Bad Practice: Static constant values set by system properties

2023-01-22 Thread Kirk Lund
Sai and I worked together yesterday and fixed almost all of the failing
unit tests in the new build we're trying to set up. I think we have one
failure remaining. Most of the failures we fixed turned out to be caused by
new code that was committed to Geode in 2022 that causes values in static
constants that are set by system properties breaking later unit tests that
will only pass if the constant has the default value.

I want to point out that using system properties to set the values in a
static constant is definitely an anti-pattern or bad practice that should
be avoided going forward. It pollutes the JVM for later unit tests and
cannot be unset because constants are defined as final and static.

In the coming weeks, I plan to update the Apache Geode wiki to spell out
the anti-pattern bad practices that we need to avoid going forward and this
will definitely be one of them.

Thanks,
Kirk


Re: Backport plan for reported CVEs on 1.13.x branch

2022-12-08 Thread Kirk Lund
Hi Ankush,

We are in need of more contributors if you're interested in helping!

At this point, we're trying to get a new CI up and running. Then we hope to
maintain the latest version 1.15.x and eventually discuss the feasibility
of new minor or major releases.

Regards,
Kirk

On Tue, Dec 6, 2022 at 5:58 AM Ankush Mittal 
wrote:

> Hi All,
>
> Some of the vulnerabilities reported under [0] seems to be fix in latest
> version of Geode i.e. 1.15.x but fix version doesn’t include 1.13.x, while
> 1.13.x is also impacted. [1]
> Can someone please confirm if the vulnerabilities fix will make it to
> 1.13.9 and what will be the tentative release schedule of the same.
> [2] doesn’t include the release schedule.
>
> Any inputs will be helpful in this regard!
>
> -Thanks & Regards
> Ankush Mittal
>
> [0] https://issues.apache.org/jira/browse/GEODE-10415
> [1] https://issues.apache.org/jira/browse/GEODE-10406
> [2] https://cwiki.apache.org/confluence/display/GEODE/Release+Schedule
>


Re: rewrite CI pipeline

2022-12-08 Thread Kirk Lund
Hey Sai,

Quick update on this. On my Mac, I'm able to build and run all unit tests
without any failures using the very latest Oracle Java 8 JDK. I'll try
OpenJDK and Azul next.

I do have a number of IntegrationTest failures. Most are BindException
failures because some process is already bound to a port. One or two
failures appear to be DNS related which may require either editing
/etc/hosts or changing the test. Will update again after debugging these
more.

Cheers,
Kirk

On Fri, Dec 2, 2022 at 12:58 PM Kirk Lund  wrote:

> Sai, I'll be able to spend some time this weekend on these failing tests.
> Hope to have some more news to share soon!
>
> -Kirk
>
> On Mon, Nov 21, 2022 at 4:35 PM Sai Boorlagadda 
> wrote:
>
>> I was thinking it has to do with either JDK or ubuntu. I also tried
>> 'temurin'
>> and there were different sets of errors. So to see if the devs can shed
>> some light on JDK dependency or any prior knowledge on choosing
>> bellsoft distribution.
>>
>> Sai
>>
>> On Mon, 21 Nov 2022 at 16:25, Kirk Lund  wrote:
>>
>> > It's also possible that these issues are specific to using OpenJDK 8 or
>> > Ubuntu. I thought we were building and testing with CentOS before.
>> >
>> > On Mon, Nov 21, 2022 at 4:20 PM Kirk Lund  wrote:
>> >
>> > > Looks like membership errors. Membership went through extensive
>> changes
>> > to
>> > > be modularized as geode-membership, and VMware was still fixing some
>> bugs
>> > > in it when I left. These could be really difficult to solve for those
>> of
>> > us
>> > > who have no experience with the new membership.
>> > >
>> > > I wonder if we could engage VMware for a little more help on these
>> > > failures before they depart entirely?
>> > >
>> > > Last I heard there were also problems with using OpenJDK 8 but I don't
>> > > know the details. VMware was using Liberica JDK 8 from Bellsoft. We
>> may
>> > > also need VMware to explain what the issue was with OpenJDK 8 so we
>> can
>> > > figure out what to do about it.
>> > >
>> > > On Mon, Nov 21, 2022 at 3:29 PM Sai Boorlagadda <
>> > sai.boorlaga...@gmail.com>
>> > > wrote:
>> > >
>> > >> Hello devs,
>> > >>
>> > >> I started a parallel activity to tease out CI work on using ASF
>> jenkins
>> > >> and
>> > >> also Github actions (though GHA has several issues). I wanted to
>> provide
>> > >> some updates on these efforts and seek if others are interested to
>> > >> collaborate.
>> > >>
>> > >> Jenkins - Was waiting for INFRA to create a high level folder[1] to
>> host
>> > >> all Geode jobs and have a basic unit test pipeline[2] that runs
>> tests on
>> > >> Ubuntu using Oracle JDK 8. While there are failing tests[3] that need
>> > some
>> > >> help, I am still exploring ways to write pipelines and include
>> multiple
>> > >> JDKs into a single JOB, so we can build using 8 while we test using
>> 11.
>> > >>
>> > >> GHA - While it is super quick and easy to start writing a pipeline[4]
>> > >> using
>> > >> "matrix and strategy", There are few tests[5] failing consistently
>> using
>> > >> Adopt JDK 8 on Ubuntu.
>> > >>
>> > >> [1] https://ci-builds.apache.org/job/Geode/
>> > >> [2]
>> > >>
>> > >>
>> >
>> https://github.com/apache/geode/blob/4abcdc06af536e1188e0f8c432e9b768c175d8c0/.jenkins/Jenkinsfile
>> > >> [3]
>> https://ci-builds.apache.org/job/Geode/job/geode-develop/11/console
>> > >> [4] https://github.com/apache/geode/pull/7870/files
>> > >> [5] https://github.com/apache/geode/actions/runs/3515795079
>> > >>
>> > >
>> >
>>
>


Re: rewrite CI pipeline

2022-12-02 Thread Kirk Lund
Sai, I'll be able to spend some time this weekend on these failing tests.
Hope to have some more news to share soon!

-Kirk

On Mon, Nov 21, 2022 at 4:35 PM Sai Boorlagadda 
wrote:

> I was thinking it has to do with either JDK or ubuntu. I also tried
> 'temurin'
> and there were different sets of errors. So to see if the devs can shed
> some light on JDK dependency or any prior knowledge on choosing
> bellsoft distribution.
>
> Sai
>
> On Mon, 21 Nov 2022 at 16:25, Kirk Lund  wrote:
>
> > It's also possible that these issues are specific to using OpenJDK 8 or
> > Ubuntu. I thought we were building and testing with CentOS before.
> >
> > On Mon, Nov 21, 2022 at 4:20 PM Kirk Lund  wrote:
> >
> > > Looks like membership errors. Membership went through extensive changes
> > to
> > > be modularized as geode-membership, and VMware was still fixing some
> bugs
> > > in it when I left. These could be really difficult to solve for those
> of
> > us
> > > who have no experience with the new membership.
> > >
> > > I wonder if we could engage VMware for a little more help on these
> > > failures before they depart entirely?
> > >
> > > Last I heard there were also problems with using OpenJDK 8 but I don't
> > > know the details. VMware was using Liberica JDK 8 from Bellsoft. We may
> > > also need VMware to explain what the issue was with OpenJDK 8 so we can
> > > figure out what to do about it.
> > >
> > > On Mon, Nov 21, 2022 at 3:29 PM Sai Boorlagadda <
> > sai.boorlaga...@gmail.com>
> > > wrote:
> > >
> > >> Hello devs,
> > >>
> > >> I started a parallel activity to tease out CI work on using ASF
> jenkins
> > >> and
> > >> also Github actions (though GHA has several issues). I wanted to
> provide
> > >> some updates on these efforts and seek if others are interested to
> > >> collaborate.
> > >>
> > >> Jenkins - Was waiting for INFRA to create a high level folder[1] to
> host
> > >> all Geode jobs and have a basic unit test pipeline[2] that runs tests
> on
> > >> Ubuntu using Oracle JDK 8. While there are failing tests[3] that need
> > some
> > >> help, I am still exploring ways to write pipelines and include
> multiple
> > >> JDKs into a single JOB, so we can build using 8 while we test using
> 11.
> > >>
> > >> GHA - While it is super quick and easy to start writing a pipeline[4]
> > >> using
> > >> "matrix and strategy", There are few tests[5] failing consistently
> using
> > >> Adopt JDK 8 on Ubuntu.
> > >>
> > >> [1] https://ci-builds.apache.org/job/Geode/
> > >> [2]
> > >>
> > >>
> >
> https://github.com/apache/geode/blob/4abcdc06af536e1188e0f8c432e9b768c175d8c0/.jenkins/Jenkinsfile
> > >> [3]
> https://ci-builds.apache.org/job/Geode/job/geode-develop/11/console
> > >> [4] https://github.com/apache/geode/pull/7870/files
> > >> [5] https://github.com/apache/geode/actions/runs/3515795079
> > >>
> > >
> >
>


Re: rewrite CI pipeline

2022-11-21 Thread Kirk Lund
It's also possible that these issues are specific to using OpenJDK 8 or
Ubuntu. I thought we were building and testing with CentOS before.

On Mon, Nov 21, 2022 at 4:20 PM Kirk Lund  wrote:

> Looks like membership errors. Membership went through extensive changes to
> be modularized as geode-membership, and VMware was still fixing some bugs
> in it when I left. These could be really difficult to solve for those of us
> who have no experience with the new membership.
>
> I wonder if we could engage VMware for a little more help on these
> failures before they depart entirely?
>
> Last I heard there were also problems with using OpenJDK 8 but I don't
> know the details. VMware was using Liberica JDK 8 from Bellsoft. We may
> also need VMware to explain what the issue was with OpenJDK 8 so we can
> figure out what to do about it.
>
> On Mon, Nov 21, 2022 at 3:29 PM Sai Boorlagadda 
> wrote:
>
>> Hello devs,
>>
>> I started a parallel activity to tease out CI work on using ASF jenkins
>> and
>> also Github actions (though GHA has several issues). I wanted to provide
>> some updates on these efforts and seek if others are interested to
>> collaborate.
>>
>> Jenkins - Was waiting for INFRA to create a high level folder[1] to host
>> all Geode jobs and have a basic unit test pipeline[2] that runs tests on
>> Ubuntu using Oracle JDK 8. While there are failing tests[3] that need some
>> help, I am still exploring ways to write pipelines and include multiple
>> JDKs into a single JOB, so we can build using 8 while we test using 11.
>>
>> GHA - While it is super quick and easy to start writing a pipeline[4]
>> using
>> "matrix and strategy", There are few tests[5] failing consistently using
>> Adopt JDK 8 on Ubuntu.
>>
>> [1] https://ci-builds.apache.org/job/Geode/
>> [2]
>>
>> https://github.com/apache/geode/blob/4abcdc06af536e1188e0f8c432e9b768c175d8c0/.jenkins/Jenkinsfile
>> [3] https://ci-builds.apache.org/job/Geode/job/geode-develop/11/console
>> [4] https://github.com/apache/geode/pull/7870/files
>> [5] https://github.com/apache/geode/actions/runs/3515795079
>>
>


Re: rewrite CI pipeline

2022-11-21 Thread Kirk Lund
Looks like membership errors. Membership went through extensive changes to
be modularized as geode-membership, and VMware was still fixing some bugs
in it when I left. These could be really difficult to solve for those of us
who have no experience with the new membership.

I wonder if we could engage VMware for a little more help on these failures
before they depart entirely?

Last I heard there were also problems with using OpenJDK 8 but I don't know
the details. VMware was using Liberica JDK 8 from Bellsoft. We may also
need VMware to explain what the issue was with OpenJDK 8 so we can figure
out what to do about it.

On Mon, Nov 21, 2022 at 3:29 PM Sai Boorlagadda 
wrote:

> Hello devs,
>
> I started a parallel activity to tease out CI work on using ASF jenkins and
> also Github actions (though GHA has several issues). I wanted to provide
> some updates on these efforts and seek if others are interested to
> collaborate.
>
> Jenkins - Was waiting for INFRA to create a high level folder[1] to host
> all Geode jobs and have a basic unit test pipeline[2] that runs tests on
> Ubuntu using Oracle JDK 8. While there are failing tests[3] that need some
> help, I am still exploring ways to write pipelines and include multiple
> JDKs into a single JOB, so we can build using 8 while we test using 11.
>
> GHA - While it is super quick and easy to start writing a pipeline[4] using
> "matrix and strategy", There are few tests[5] failing consistently using
> Adopt JDK 8 on Ubuntu.
>
> [1] https://ci-builds.apache.org/job/Geode/
> [2]
>
> https://github.com/apache/geode/blob/4abcdc06af536e1188e0f8c432e9b768c175d8c0/.jenkins/Jenkinsfile
> [3] https://ci-builds.apache.org/job/Geode/job/geode-develop/11/console
> [4] https://github.com/apache/geode/pull/7870/files
> [5] https://github.com/apache/geode/actions/runs/3515795079
>


Re: priority list

2022-11-21 Thread Kirk Lund
I have next to no experience with Jenkins but I'd be happy to get on some
sort of zoom or google or other screen sharing conference call to offer
what little help I can.

-Kirk

On Tue, Nov 15, 2022 at 2:19 PM Sai Boorlagadda 
wrote:

> Thanks Mark.
>
> Dan added me required permissions. So I will try to get a simple build step
> and configure the job.
>
> Will reach out for specific questions or issues. Do you or anyone have any
> project reference that uses Jenkins?
>
> Sai
>
> On Mon, 14 Nov 2022 at 07:01, Mark Bretl  wrote:
>
> > Jenkins CI in 2014 was limited to say the least, now Jenkins has actual
> > pipelines and stages, with parallel functionality, so it will be much
> > better this time around if we go that route. I think we could go back to
> > the basics, do a compile build and then add unit/basic/BVT tests on top
> > without too much trouble. I would say if we can get CI running for the
> main
> > Geode project with a compile in Jenkins, it would be a great first step.
> I
> > do have quite a bit of Jenkins experience, so I can definitely help out.
> >
> > --Mark
> >
> > On Sat, Nov 12, 2022 at 10:24 PM Kirk Lund  wrote:
> >
> > > Do we know why Geode has such a large CI resource requirement?
> > >
> > > I would guess that it was partially due to trying to run as many tests
> in
> > > parallel as possible to shorten the feedback cycle. The recent CI
> > pipelines
> > > were also built on Pivotal's Concourse which seems to promote a greater
> > > number of smaller CI jobs (or at least that's my impression).
> > >
> > > This code base did successfully use a Jenkins CI prior to 2014 even
> > though
> > > it took more hours to complete than the more recent Concourse CI. I
> think
> > > Mark Bretl was involved in that Jenkins CI so he might remember some
> > > details or tips or even possible challenges to watch out for.
> > >
> > > -Kirk
> > >
> > > On Sat, Nov 12, 2022 at 4:42 AM Mario Salazar de Torres
> > >  wrote:
> > >
> > > > Hi,
> > > > About GitHub actions, there are currently some limitations you I
> > pointed
> > > > out previously in the devlist.
> > > > Even tho, I was stuck in the process of migrating geode-native CI to
> GH
> > > > actions, mostly since I didn't have the necessary permissions.
> > > > If you want to have further info about GH actions, you can check
> Apache
> > > > BUILDS list.
> > > >
> > > > And as for Geode repository, considering the number of resources its
> CI
> > > > requires, I'd say GH actions is a no go...
> > > > Also, I think it was Dan Smith, the one that pointed out that Apache
> > has
> > > a
> > > > Jenkins instance available, so every Apache project can use its
> > > resources.
> > > > My guess is that Apache Jenkins infra would be a better fit for Geode
> > > > repository. Still, it remains to be seen, since resource requirements
> > on
> > > > that repository are really high.
> > > >
> > > > Sorry I couldn't be of more help, but at least I hope these pointers
> > are
> > > > useful.
> > > >
> > > > /Mario
> > > >
> > > > 
> > > > From: Niall Pemberton 
> > > > Sent: Saturday, November 12, 2022 10:46 AM
> > > > To: dev@geode.apache.org 
> > > > Subject: Re: priority list
> > > >
> > > > On Sat, Nov 12, 2022 at 3:24 AM Sai Boorlagadda <
> > > sai.boorlaga...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hello Devs,
> > > > >
> > > > > I would like to understand some of the top priority items that we
> > > should
> > > > > focus and spend our time on while we ramp up with new community
> > > members.
> > > > >
> > > > > Any pointers to the scope of the next release or a list of items we
> > > > should
> > > > > do as part of this transition. While going through the mailing list
> > > what
> > > > > immediately caught my eye is CI/CD migration to Github actions.
> > > > >
> > > >
> > > > I would say the number one priority is getting a CI instance setup. I
> > > guess
> > > > you've seen what Mario said about his efforts on GitHub actions?
> > > >
> > > > Niall
> > > >
> > > >
> > > > >
> > > > > Sai
> > > > >
> > > >
> > >
> >
>


Re: priority list

2022-11-12 Thread Kirk Lund
Do we know why Geode has such a large CI resource requirement?

I would guess that it was partially due to trying to run as many tests in
parallel as possible to shorten the feedback cycle. The recent CI pipelines
were also built on Pivotal's Concourse which seems to promote a greater
number of smaller CI jobs (or at least that's my impression).

This code base did successfully use a Jenkins CI prior to 2014 even though
it took more hours to complete than the more recent Concourse CI. I think
Mark Bretl was involved in that Jenkins CI so he might remember some
details or tips or even possible challenges to watch out for.

-Kirk

On Sat, Nov 12, 2022 at 4:42 AM Mario Salazar de Torres
 wrote:

> Hi,
> About GitHub actions, there are currently some limitations you I pointed
> out previously in the devlist.
> Even tho, I was stuck in the process of migrating geode-native CI to GH
> actions, mostly since I didn't have the necessary permissions.
> If you want to have further info about GH actions, you can check Apache
> BUILDS list.
>
> And as for Geode repository, considering the number of resources its CI
> requires, I'd say GH actions is a no go...
> Also, I think it was Dan Smith, the one that pointed out that Apache has a
> Jenkins instance available, so every Apache project can use its resources.
> My guess is that Apache Jenkins infra would be a better fit for Geode
> repository. Still, it remains to be seen, since resource requirements on
> that repository are really high.
>
> Sorry I couldn't be of more help, but at least I hope these pointers are
> useful.
>
> /Mario
>
> 
> From: Niall Pemberton 
> Sent: Saturday, November 12, 2022 10:46 AM
> To: dev@geode.apache.org 
> Subject: Re: priority list
>
> On Sat, Nov 12, 2022 at 3:24 AM Sai Boorlagadda  >
> wrote:
>
> > Hello Devs,
> >
> > I would like to understand some of the top priority items that we should
> > focus and spend our time on while we ramp up with new community members.
> >
> > Any pointers to the scope of the next release or a list of items we
> should
> > do as part of this transition. While going through the mailing list what
> > immediately caught my eye is CI/CD migration to Github actions.
> >
>
> I would say the number one priority is getting a CI instance setup. I guess
> you've seen what Mario said about his efforts on GitHub actions?
>
> Niall
>
>
> >
> > Sai
> >
>


Jira Contributors vs Committers vs "Assign User"

2022-09-21 Thread Kirk Lund
Contributors have the "Assignable" permission, but are lacking "Assign
User" which despite the text description means that they cannot assign a
ticket to themselves. Basically Contributors cannot assign anything to
anyone.

Ideally, we want to add "Assign User" to Contributors rather than keep
adding everyone who comes along to the Committers group.

The Apache Geode PMC and admins do not have the necessary permissions to
edit the Roles. We can only add, assign and remove Users. If we want to
edit Contributors to add "Assign User" we'll need to have one of us contact
ASF Infra to request this change to our Project in Jira.

Otherwise we have to assign all newcomers to the Committers group instead
of Contributors.

Note: I haven't changed any Users or Roles in Jira (I just have time to
quickly post this message right now).

-Kirk


CVE-2022-37023: Apache Geode deserialization of untrusted data flaw when using REST API on Java 8 or Java 11

2022-08-30 Thread Kirk Lund
Severity: high - possible RCE

Description:

Apache Geode versions prior to 1.15.0 are vulnerable to a deserialization of 
untrusted data flaw when using REST API on Java 8 or Java 11.

Any user wishing to protect against deserialization attacks involving REST APIs 
should upgrade to Apache Geode 1.15 and follow the documentation for details on 
enabling "validate-serializable-objects=true" and specifying any user classes 
that may be serialized/deserialized with "serializable-object-filter". Enabling 
"validate-serializable-objects" may impact performance.

Mitigation:

Disable affected services such as JMX over RMI or REST APIs unless they are 
required. REST APIs can be disabled by setting `http-service-port` to zero.



CVE-2022-37022: Apache Geode deserialization of untrusted data flaw when using JMX over RMI on Java 11

2022-08-30 Thread Kirk Lund
Severity: high - possible RCE

Description:

Apache Geode versions up to 1.12.2 and 1.13.2 are vulnerable to a 
deserialization of untrusted data flaw when using JMX over RMI on Java 11.

Any user wishing to protect against deserialization attacks involving JMX or 
RMI should upgrade to Apache Geode 1.15. Use of 1.15 on Java 11 will 
automatically protect JMX over RMI against deserialization attacks. This should 
have no impact on performance since it only affects JMX/RMI which Gfsh uses to 
communicate with the JMX Manager which is hosted on a Locator.

This issue is being tracked as GEODE-9064

Mitigation:

Disable affected services such as JMX over RMI unless they are required. JMX 
over RMI can be disabled by setting Geode property `jmx-manager` to false; this 
property defaults to false on Servers and true on Locators. 



CVE-2022-37021: Apache Geode deserialization of untrusted data flaw when using JMX over RMI on Java 8.

2022-08-30 Thread Kirk Lund
Severity: high - possible RCE

Description:

Apache Geode versions up to 1.12.5, 1.13.4 and 1.14.0 are vulnerable to a 
deserialization of untrusted data flaw when using JMX over RMI on Java 8. 

Any user still on Java 8 who wishes to protect against deserialization attacks 
involving JMX or RMI should upgrade to Apache Geode 1.15 and Java 11. 

If upgrading to Java 11 is not possible, then upgrade to Apache Geode 1.15 and 
specify "--J=-Dgeode.enableGlobalSerialFilter=true" when starting any Locators 
or Servers. Follow the documentation for details on specifying any user classes 
that may be serialized/deserialized with the "serializable-object-filter" 
configuration option. Using a global serial filter will impact performance.

This issue is being tracked as GEODE-9758

Mitigation:

Disable affected services such as JMX over RMI unless they are required. JMX 
over RMI can be disabled by setting Geode property `jmx-manager` to false; this 
property defaults to false on Servers and true on Locators. 



Prevent common causes of flaky tests

2022-06-01 Thread Kirk Lund
How to prevent common causes of flakiness in tests:

1. Use the correct timeout
  a. Always use GeodeAwaitility.getTimeout() for all timeouts in tests even
if you think it should be shorter.

2. Use a free port or disable the port instead of using any default ports
  a. Use AvailablePortHelper to get free ports to use instead of using
default ports. Remember: if you didn’t specify or disable the port, it’s
using the default.
  b. [server] Specify --server-port=n when starting servers in a test with
clients. Use --disable-default-server to disable it.
  c. [locator] Specify --port=n when starting locators. (I don’t think this
can be disabled)
  d. [locator] Specify --http-service-port=n when starting locators that
need Pulse/REST/HTTP. Use zero as value to disable it.
  e. [locator] Specify --J=-Dgemfire.jmx-manager-port=n in a test that uses
JMX or GFSH. Use --
J=-Dgemfire.jmx-manager=false to disable it.
  f. If the test needs to use APIs and gemfire.properties instead of Gfsh
switches:
i. --server-port is specified using CacheServerFactory and CacheServer
APIs
ii. --port is specified using the Locator and InternalLocator APIs
iii. --http-service-port is specified using http-service-port
iv. JMX Manager is configured with jmx-manager-port and jmx-manager

3. Enable only the services the test actually uses
  a. Don’t enable cluster config, REST API, etc if the test doesn’t use the
service.

4. Restart needs to await full-stop before restarting with the same ports
  a. Use GfshStopper for some new handy utilities (introduced by
GEODE-10327).


Re: [PROPOSAL] Remove warning logs from FunctionException

2022-05-13 Thread Kirk Lund
Hi Alberto,

I don't really know if this change is good or bad but I do have some
thoughts about logging and changing behavior like this.

First off, we might think about whether or not logging these warnings is
currently expected behavior. Would removing/changing it cause surprise for
any users upgrading to the version that removes it? Would it make things
more difficult for the vendor or Apache Geode to provide support to a user
having issues in FunctionExecution?

It might make more sense to refactor the code that issues these warnings to
use a pluggable mechanism for handling exceptions and error conditions in
FunctionExecution? For example, you might consider introducing an SPI for a
new FunctionExecutionErrorHandler. The default implementation would log the
exceptions at warning level. Since it's an SPI (using Java Service Loader
to load an implementation), you could provide your own implementation that
does anything you want in response, including the possibility of logging it
at debug level instead of warning.

This also highlights a bigger issue in Apache Geode. There has never been a
policy or even set of recommendations on the logging performed by Geode
classes and components. There is also no standard or recommendations for
formatting or wording of log statements. I believe the issue involving
exception logging in FunctionExecution is a direct result of this. Geode
needs detailed guidelines about log levels, log statement wording,
inclusion of exception stacks, use of Markers, etc. Not just per class, but
probably per Geode component or service or layer as well. Geode also needs
some devs to spend time experimenting with specifying custom log4j2.xml
configuration files, finding usability problems and hopefully improving how
this currently works so that it becomes easy for users to customize what
information is actually being logged.

For example, there are problems with how Log4j is currently used by Geode
such as using Markers at TRACE or DEBUG levels instead of at INFO level --
in order to enable Marker controlled logging, one would have to set the
log-level to DEBUG and then also enable the Marker. Since we know DEBUG
level already produces too much logging (much of it useless noise), this
just exacerbates the problem when what is really wanted is well-behaved,
consistent INFO level logging AND the ability to enable Marked controlled
logging without having to use a finer log-level. It's possible the solution
to Marker usage in Geode is to move Marker controlled log statements to
INFO level. That way you could leave the logging at INFO level and just
enable one or more Markers to increase logging from specific areas.

I primarily wanted to point out that you may be trying to address a smaller
part of the real problem here and that you should always consider pluggable
mechanisms for customizing behavior rather than just considering changes
that would directly alter the current behavior. Also, remember to think
about whether a change like this would be best in a major, minor or patch
release. Some users or the devs supporting them may rely on this
information (I'm not saying they do in this case, just that it's possible).

Thanks,
Kirk

On Thu, Apr 28, 2022 at 3:00 AM Alberto Gomez 
wrote:

> Hi Barry,
>
> If the exception is returned by passing it to the ResultCollector's
> sendException() method then the exception is not logged. If the exception
> is returned by passing it to the lastResult() method then the exception
> (and the stack trace) is logged. I am assuming that when you say that the
> exception is returned in its result is done by means of the sendException()
> method.
>
> I agree with you that Geode must be consistent and if an exception is
> thrown by the function, then the exception should be logged no matter if
> isHA is returning false or true. Like you, I have also observed that when
> isHA is returning false the exception is not logged.
>
> I also think it is worth to at least make this logging of the exception
> configurable for those cases where functions prefer to throw the exception
> instead of sending it and still do not want to see those exceptions logged.
>
> Thanks,
>
> Alberto
> 
> From: Barry Oglesby 
> Sent: Thursday, April 28, 2022 2:32 AM
> To: Alberto Gomez ; dev@geode.apache.org <
> dev@geode.apache.org>
> Subject: Re: [PROPOSAL] Remove warning logs from FunctionException
>
> A function can throw an exception and can also return an exception in its
> result. I'm not sure I've seen too many functions where throwing an
> exception is the expected result. In my very quick testing, I see the
> exception and stack logged if the exception is thrown by the function but
> not if the exception is returned. Are you seeing that same behavior, or are
> both cases logging the exception? I also see the behavior you described
> where isHA returning false does not log the exception. I guess I would say
> if an exception is thrown in either 

Writing acceptance tests with a server but no clients

2022-05-04 Thread Kirk Lund
Since the server doesn't have any clients connecting to it, you can simply
specify --disable-default-server instead of using --server-port.

This reduces the risk of any port conflicts and also speeds up starting
that server.


distributedTests should go in src/distributedTest

2022-05-02 Thread Kirk Lund
Just a quick reminder that all distributedTests should go in
src/distributedTest.

A test should only be considered an acceptanceTest if it uses GfshRule to
launch the locator and server processes, and it should only use User APIs
for everything (configuration, initiating operations and validating
results). If the test cannot be performed without resorting to using
VM.invoke or something like that to poke at the process, then please just
make it a distributedTest rather than adding it to src/acceptanceTest.

Thanks,
Kirk


Re: Commit message review opt-in

2022-03-08 Thread Kirk Lund
So, we just submit edits to COMMITWATCHERS going forward?

Thanks Owen!

On Tue, Mar 8, 2022 at 11:07 AM Owen Nichols  wrote:

> To make opt-in and opt-out less noisy for the dev list, the subscription
> mechanism has been moved to COMMITWATCHERS file under source control to
> provide self-service (like with CODEOWNERS or CODEWATCHERS).
>
> From: Jacob Barrett 
> Date: Friday, March 4, 2022 at 1:20 PM
> To:
> Subject: Re: Commit message review opt-in
> Can we please agree to send requests for inclusion directly to
> onich...@vmware.com rather than back to this
> thread.
>
> On Mar 4, 2022, at 1:09 PM, Owen Nichols  onich...@vmware.com>> wrote:
>
> Thanks Udo, I have added kohlmu-pivotal as requested, for non-draft PRs
> only.
>
> From: Udo Kohlmeyer mailto:u...@vmware.com>>
> Date: Friday, March 4, 2022 at 1:03 PM
> To: dev@geode.apache.org <
> dev@geode.apache.org>
> Subject: Re: Commit message review opt-in
> I’ll opt-in please.
>
> --Udo
>
> From: Owen Nichols mailto:onich...@vmware.com>>
> Date: Saturday, February 26, 2022 at 10:44 AM
> To: geode mailto:dev@geode.apache.org>>
> Subject: Commit message review opt-in
> If you received an automated-looking PR review comment from
> onichols-pivotal in the past few weeks without consent, I apologize and
> have withdrawn it.
>
> If you would like to continue receiving these automated reviews, you may
> opt-in by emailing me your github username (also let me know if you want
> your Draft PRs included or not).
>
> Geode’s guidelines for commit messages should be interpreted as
> suggestions, not rules.  All Geode committers are welcomed with "we like to
> work on trust, not unnecessary restrictions".  I hope this change to opt-in
> better reflects that spirit.
>


Re: Commit message review opt-in

2022-03-03 Thread Kirk Lund
I'd like to opt-in for this Owen. Thanks!

-Kirk

On Fri, Feb 25, 2022 at 3:43 PM Owen Nichols  wrote:

> If you received an automated-looking PR review comment from
> onichols-pivotal in the past few weeks without consent, I apologize and
> have withdrawn it.
>
> If you would like to continue receiving these automated reviews, you may
> opt-in by emailing me your github username (also let me know if you want
> your Draft PRs included or not).
>
> Geode’s guidelines for commit messages should be interpreted as
> suggestions, not rules.  All Geode committers are welcomed with "we like to
> work on trust, not unnecessary restrictions".  I hope this change to opt-in
> better reflects that spirit.
>


Re: Question regarding geode thread priorities

2022-01-26 Thread Kirk Lund
PS: Sorry if I didn't realize what BRs is :D

On Wed, Jan 26, 2022 at 3:39 PM Kirk Lund  wrote:

> Hi BRs/Jakov,
>
> I'm familiar with most of these threads, and these ones I know of do not
> spawn more than one thread total. Most of these are quite old, possibly
> predating Executors in Java. I doubt using max priority is important for
> these threads, but you should probably do some perf testing if you want to
> remove setMaxPriority. I recommend using
> https://github.com/apache/geode-benchmarks as well as writing targeted
> JMH micro-benchmarks.
>
> Cheers,
> Kirk
>
> On Mon, Jan 24, 2022 at 3:50 AM Jakov Varenina 
> wrote:
>
>> Hi community,
>>
>> We have came across to some code in geode that prioritizes some of the
>> threads using
>>
>> https://cr.openjdk.java.net/~iris/se/11/latestSpec/api/java.base/java/lang/Thread.html#setPriority(int).
>>
>> Below you can find links to code.
>>
>>
>> https://github.com/apache/geode/blob/41eb49989f25607acfcbf9ac5afe3d4c0721bb35/geode-core/src/main/java/org/apache/geode/internal/statistics/HostStatSampler.java#L304
>>
>>
>> https://github.com/apache/geode/blob/41eb49989f25607acfcbf9ac5afe3d4c0721bb35/geode-core/src/main/java/org/apache/geode/internal/cache/control/OffHeapMemoryMonitor.java#L92
>>
>>
>> https://github.com/apache/geode/blob/d79a3c78eab96a9e760db07fa42580e61586b9c5/geode-core/src/main/java/org/apache/geode/internal/cache/control/InternalResourceManager.java#L147
>>
>>
>> https://github.com/apache/geode/blob/a5bd36f9fa787d3a71c6e6efafed5a7b0fe52d2b/geode-core/src/main/java/org/apache/geode/internal/tcp/TCPConduit.java#L343
>>
>> Just to add that every new thread inherits parent thread priority, so
>> that means that there will be more thread with max priority in addition
>> to the above threads. Does somebody know why this is set for these
>> particular threads?
>>
>> Additionally, in multiple online resources it is indicated that these
>> priories are not taken into the account by the Linux scheduler unless
>> additional parameters in JVM are set (UseThreadPriorities and
>> ThreadPriorityPolicy), please check links for more information's:
>>
>> https://github.com/openjdk/jdk/blob/jdk8-b120/hotspot/src/share/vm/runtime/globals.hpp#L3369,L3392
>> and
>>
>> https://github.com/openjdk/jdk/blob/jdk8-b120/hotspot/src/os/linux/vm/os_linux.cpp#L3961,L3966
>>
>> Are these priorities that are set in Apache Geode code crucial and
>> should be enabled for better performance, or they shouldn't be used?
>> Also did I maybe miss something and these priorities are somehow used
>> even without setting mentioned JVM parameters?
>>
>> Any help on this topic is welcome and sorry for bothering!
>>
>> BRs/Jakov
>>
>>


Re: Question regarding geode thread priorities

2022-01-26 Thread Kirk Lund
Hi BRs/Jakov,

I'm familiar with most of these threads, and these ones I know of do not
spawn more than one thread total. Most of these are quite old, possibly
predating Executors in Java. I doubt using max priority is important for
these threads, but you should probably do some perf testing if you want to
remove setMaxPriority. I recommend using
https://github.com/apache/geode-benchmarks as well as writing targeted JMH
micro-benchmarks.

Cheers,
Kirk

On Mon, Jan 24, 2022 at 3:50 AM Jakov Varenina 
wrote:

> Hi community,
>
> We have came across to some code in geode that prioritizes some of the
> threads using
>
> https://cr.openjdk.java.net/~iris/se/11/latestSpec/api/java.base/java/lang/Thread.html#setPriority(int).
>
> Below you can find links to code.
>
>
> https://github.com/apache/geode/blob/41eb49989f25607acfcbf9ac5afe3d4c0721bb35/geode-core/src/main/java/org/apache/geode/internal/statistics/HostStatSampler.java#L304
>
>
> https://github.com/apache/geode/blob/41eb49989f25607acfcbf9ac5afe3d4c0721bb35/geode-core/src/main/java/org/apache/geode/internal/cache/control/OffHeapMemoryMonitor.java#L92
>
>
> https://github.com/apache/geode/blob/d79a3c78eab96a9e760db07fa42580e61586b9c5/geode-core/src/main/java/org/apache/geode/internal/cache/control/InternalResourceManager.java#L147
>
>
> https://github.com/apache/geode/blob/a5bd36f9fa787d3a71c6e6efafed5a7b0fe52d2b/geode-core/src/main/java/org/apache/geode/internal/tcp/TCPConduit.java#L343
>
> Just to add that every new thread inherits parent thread priority, so
> that means that there will be more thread with max priority in addition
> to the above threads. Does somebody know why this is set for these
> particular threads?
>
> Additionally, in multiple online resources it is indicated that these
> priories are not taken into the account by the Linux scheduler unless
> additional parameters in JVM are set (UseThreadPriorities and
> ThreadPriorityPolicy), please check links for more information's:
>
> https://github.com/openjdk/jdk/blob/jdk8-b120/hotspot/src/share/vm/runtime/globals.hpp#L3369,L3392
> and
>
> https://github.com/openjdk/jdk/blob/jdk8-b120/hotspot/src/os/linux/vm/os_linux.cpp#L3961,L3966
>
> Are these priorities that are set in Apache Geode code crucial and
> should be enabled for better performance, or they shouldn't be used?
> Also did I maybe miss something and these priorities are somehow used
> even without setting mentioned JVM parameters?
>
> Any help on this topic is welcome and sorry for bothering!
>
> BRs/Jakov
>
>


Re: [Suspected Spam] [VOTE] Apache Geode 1.13.7.RC1

2022-01-19 Thread Kirk Lund
+1 same as Naba

On Wed, Jan 19, 2022 at 4:26 PM Nabarun Nag  wrote:

> +1 based on the results of the pipeline tests.
>
> Regards
> Nabarun Nag
>
> 
> From: Dick Cavender 
> Sent: Monday, January 17, 2022 11:20 AM
> To: dev@geode.apache.org 
> Subject: [Suspected Spam] [VOTE] Apache Geode 1.13.7.RC1
>
> Hello Geode Dev Community,
>
> This is a release candidate for Apache Geode version 1.13.7.RC1.
> Thanks to all the community members for their contributions to this
> release!
>
> Please do a review and give your feedback, including the checks you
> performed.
>
> Voting deadline:
> 3PM PST Thursday, January 20 2022.
>
> Please note that we are voting upon the source tag:
> rel/v1.13.7.RC1
>
> Release notes:
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FRelease%2BNotes%23ReleaseNotes-1.13.7data=04%7C01%7Cnnag%40vmware.com%7Ca9dee05fec1f4851e8ca08d9d9ee6cea%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637780440337254741%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=V5yh9gXa39eKG7MY0uXruCe0qgODD3ef94YwMJcoKRw%3Dreserved=0
>
> Source and binary distributions:
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fgeode%2F1.13.7.RC1%2Fdata=04%7C01%7Cnnag%40vmware.com%7Ca9dee05fec1f4851e8ca08d9d9ee6cea%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637780440337254741%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=NhGIBZQCPTvIdXFaxCTYqatC13cJkrjyx4AVgKk5698%3Dreserved=0
>
> Maven staging repo:
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachegeode-1126data=04%7C01%7Cnnag%40vmware.com%7Ca9dee05fec1f4851e8ca08d9d9ee6cea%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637780440337254741%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=rpF5i%2B1MUSkt%2FTU%2BKSv9d%2BjLiO3Vt8KPHXhSnMcBfhM%3Dreserved=0
>
> GitHub:
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Ftree%2Frel%2Fv1.13.7.RC1data=04%7C01%7Cnnag%40vmware.com%7Ca9dee05fec1f4851e8ca08d9d9ee6cea%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637780440337254741%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=cDmYAZN8vKLs5bRVmOPDGw6YshASiNiXX9I7PoxGZyc%3Dreserved=0
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-examples%2Ftree%2Frel%2Fv1.13.7.RC1data=04%7C01%7Cnnag%40vmware.com%7Ca9dee05fec1f4851e8ca08d9d9ee6cea%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637780440337254741%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=6WKvYUp52vm0qhBeNJKkAZYMiagI7hp1d7UIizTvAqw%3Dreserved=0
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-native%2Ftree%2Frel%2Fv1.13.7.RC1data=04%7C01%7Cnnag%40vmware.com%7Ca9dee05fec1f4851e8ca08d9d9ee6cea%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637780440337254741%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=ZsDwPqv6gq%2B9z9g30M5xHAtT2e3T5AepXheqob2m8j0%3Dreserved=0
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-benchmarks%2Ftree%2Frel%2Fv1.13.7.RC1data=04%7C01%7Cnnag%40vmware.com%7Ca9dee05fec1f4851e8ca08d9d9ee6cea%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637780440337254741%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=hCAbePGsAZllyrXVhQoiuq%2BBANvLe%2BcpQQ0puQCeps0%3Dreserved=0
>
> Pipelines:
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fconcourse.apachegeode-ci.info%2Fteams%2Fmain%2Fpipelines%2Fapache-support-1-13-maindata=04%7C01%7Cnnag%40vmware.com%7Ca9dee05fec1f4851e8ca08d9d9ee6cea%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637780440337254741%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=7C8JYijQu0VFpObgZj3GQDI%2FicP48lFcbn7Ji60%2FuXI%3Dreserved=0
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fconcourse.apachegeode-ci.info%2Fteams%2Fmain%2Fpipelines%2Fapache-support-1-13-rcdata=04%7C01%7Cnnag%40vmware.com%7Ca9dee05fec1f4851e8ca08d9d9ee6cea%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637780440337254741%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=MAqZ9jupVC%2B52z%2B7YaJZdbn7sZ4Bb36h0tbj1aH8YUM%3Dreserved=0
>
> Geode's KEYS file containing PGP keys we use to sign the release:
>
> 

CVE-2021-34797: Apache Geode project log file redaction of sensitive information vulnerability

2022-01-03 Thread Kirk Lund
Severity: low

Description:

Apache Geode versions up to 1.12.4 and 1.13.4 are vulnerable to a log file
redaction of sensitive information flaw when using values that begin with
characters other than letters or numbers for passwords and security
properties with the prefix "sysprop-", "javax.net.ssl", or "security-".
This issue is fixed by overhauling the log file redaction in Apache Geode
versions 1.12.5, 1.13.5, and 1.14.0.

This issue is being tracked as GEODE-9354

References:

https://lists.apache.org/thread/p4l0g49rzzzpn8yt9q9p0xp52h3zmsmk
https://lists.apache.org/thread/nq2w9gjzm1cjx1rh6zw41ty39qw7qpx4
https://issues.apache.org/jira/browse/GEODE-9354

Credit:

Apache Geode would like to thank Aaron Lindsey for reporting this issue.


Re: [DISCUSS] Adding LTGM as gating PR checks

2021-12-22 Thread Kirk Lund
I support making LGTM a gating job on every PR.

On Thu, Dec 16, 2021 at 2:46 PM Alexander Murmann 
wrote:

> +1 to adding this. Given it has a low false-positive rate, only checks on
> code actually changed in the PR and runs quickly compared to some of our
> other steps that already happen for every PR, this seems like an easy
> decision.
>
> 
> From: Robert Houghton 
> Sent: Thursday, December 16, 2021 14:20
> To: dev@geode.apache.org 
> Subject: Re: [DISCUSS] Adding LTGM as gating PR checks
>
> Short answer would be to work with the rest of the community to get the
> check to pass, fix the LGTM configuration, something like that. Otherwise,
> the Concourse CI has the ability to set PR context messages.
>
> -Robert
>
> From: Owen Nichols 
> Date: Thursday, December 16, 2021 at 10:40 AM
> To: dev@geode.apache.org 
> Subject: Re: [DISCUSS] Adding LTGM as gating PR checks
> Requiring LGTM looks good to me.  It does not seem to have a high rate of
> false-positives like some other linters, but if we are making it gating,
> what would the process look like to override a false-positive?
>
> On 12/16/21, 10:37 AM, "Anthony Baker"  wrote:
>
> Thanks Robert, I think this is important. I think this is a good first
> step.
>
> In future I think we should consider adding a CI job to ensure that
> pre-existing security errors are addressed. Perhaps GitHub code scanning is
> worth investigating since they have acquired the LGTM product.
>
> Anthony
>
>
> > On Dec 16, 2021, at 10:08 AM, Robert Houghton 
> wrote:
> >
> > We have had LGTM tests enabled on Apache Geode PRs for quite some
> time, and have done a great job of trending those warnings and errors to in
> the right direction. I would like to make the change to our GitHub to make
> those changes blocking for all new PRs, given their reliability and
> lack-of-flakiness.
> >
> > Does anyone have strong feelings against that?
> >
> > -Robert Houghton
>
>


Re: [VOTE] Apache Geode 1.14.1.RC2

2021-12-10 Thread Kirk Lund
+1 to release 1.14.1.RC2

I reviewed pipeline results and binaries.

PS: I'm not sure what the "upthewaterspout" job in
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-support-1-14-rc
is
all about. I'd prefer to have that job renamed for future releases.

On Fri, Dec 10, 2021 at 2:09 PM Owen Nichols  wrote:

> Hello Geode Dev Community,
>
> This is a release candidate for Apache Geode version 1.14.1.RC2.
> Thanks to all the community members for their contributions to this
> release!
>
> Please do a review and give your feedback, including the checks you
> performed.
>
> EXPEDITED Voting deadline:
> 3PM PST Sat, December 11 2021.
>
> Please note that we are voting upon the source tag:
> rel/v1.14.1.RC2
>
> Release notes:
>
> https://cwiki.apache.org/confluence/display/GEODE/Release+Notes#ReleaseNotes-1.14.1
>
> Source and binary distributions:
> https://dist.apache.org/repos/dist/dev/geode/1.14.1.RC2/
>
> Maven staging repo:
> https://repository.apache.org/content/repositories/orgapachegeode-1118
>
> GitHub:
> https://github.com/apache/geode/tree/rel/v1.14.1.RC2
> https://github.com/apache/geode-examples/tree/rel/v1.14.1.RC2
> https://github.com/apache/geode-native/tree/rel/v1.14.1.RC2
> https://github.com/apache/geode-benchmarks/tree/rel/v1.14.1.RC2
>
> Pipelines:
>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-support-1-14-main
>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-support-1-14-rc
>
> Geode's KEYS file containing PGP keys we use to sign the release:
> https://github.com/apache/geode/blob/develop/KEYS
>
> Command to run geode-examples:
> ./gradlew -PgeodeReleaseUrl=
> https://dist.apache.org/repos/dist/dev/geode/1.14.1.RC2
> -PgeodeRepositoryUrl=
> https://repository.apache.org/content/repositories/orgapachegeode-1118
> build runAll
>
> Regards
> Owen Nichols
>


Re: [VOTE] Apache Geode 1.13.5.RC2

2021-12-10 Thread Kirk Lund
+1 to release 1.13.5.RC2

I reviewed pipeline results and binaries. distributed-test-openjdk8
encountered a known JMX bug that exists in all released versions.

On Fri, Dec 10, 2021 at 2:09 PM Owen Nichols  wrote:

> Hello Geode Dev Community,
>
> This is a release candidate for Apache Geode version 1.13.5.RC2.
> Thanks to all the community members for their contributions to this
> release!
>
> Please do a review and give your feedback, including the checks you
> performed.
>
> EXPEDITED Voting deadline:
> 3PM PST Sat, December 11 2021.
>
> Please note that we are voting upon the source tag:
> rel/v1.13.5.RC2
>
> Release notes:
>
> https://cwiki.apache.org/confluence/display/GEODE/Release+Notes#ReleaseNotes-1.13.5
>
> Source and binary distributions:
> https://dist.apache.org/repos/dist/dev/geode/1.13.5.RC2/
>
> Maven staging repo:
> https://repository.apache.org/content/repositories/orgapachegeode-1117
>
> GitHub:
> https://github.com/apache/geode/tree/rel/v1.13.5.RC2
> https://github.com/apache/geode-examples/tree/rel/v1.13.5.RC2
> https://github.com/apache/geode-native/tree/rel/v1.13.5.RC2
> https://github.com/apache/geode-benchmarks/tree/rel/v1.13.5.RC2
>
> Pipelines:
>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-support-1-13-main
>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-support-1-13-rc
>
> Geode's KEYS file containing PGP keys we use to sign the release:
> https://github.com/apache/geode/blob/develop/KEYS
>
> Command to run geode-examples:
> ./gradlew -PgeodeReleaseUrl=
> https://dist.apache.org/repos/dist/dev/geode/1.13.5.RC2
> -PgeodeRepositoryUrl=
> https://repository.apache.org/content/repositories/orgapachegeode-1117
> build runAll
>
> Regards
> Owen Nichols
>


Re: [VOTE] Apache Geode 1.12.6.RC2

2021-12-10 Thread Kirk Lund
+1 to release 1.12.6.RC2

I reviewed pipeline results and binaries.

On Fri, Dec 10, 2021 at 2:09 PM Owen Nichols  wrote:

> Hello Geode Dev Community,
>
> This is a release candidate for Apache Geode version 1.12.6.RC2.
> Thanks to all the community members for their contributions to this
> release!
>
> Please do a review and give your feedback, including the checks you
> performed.
>
> EXPEDITED Voting deadline:
> 3PM PST Sat, December 11 2021.
>
> Please note that we are voting upon the source tag:
> rel/v1.12.6.RC2
>
> Release notes:
>
> https://cwiki.apache.org/confluence/display/GEODE/Release+Notes#ReleaseNotes-1.12.6
>
> Source and binary distributions:
> https://dist.apache.org/repos/dist/dev/geode/1.12.6.RC2/
>
> Maven staging repo:
> https://repository.apache.org/content/repositories/orgapachegeode-1119
>
> GitHub:
> https://github.com/apache/geode/tree/rel/v1.12.6.RC2
> https://github.com/apache/geode-examples/tree/rel/v1.12.6.RC2
> https://github.com/apache/geode-native/tree/rel/v1.12.6.RC2
> https://github.com/apache/geode-benchmarks/tree/rel/v1.12.6.RC2
>
> Pipelines:
>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-support-1-12-main
>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-support-1-12-rc
>
> Geode's KEYS file containing PGP keys we use to sign the release:
> https://github.com/apache/geode/blob/develop/KEYS
>
> Command to run geode-examples:
> ./gradlew -PgeodeReleaseUrl=
> https://dist.apache.org/repos/dist/dev/geode/1.12.6.RC2
> -PgeodeRepositoryUrl=
> https://repository.apache.org/content/repositories/orgapachegeode-1119
> build runAll
>
> Regards
> Owen Nichols
>


Re: @TestOnly or @VisibleForTesting

2021-11-05 Thread Kirk Lund
For constructors, I suggest that we don't need to use either annotation. A
package-private constructor that accepts parameters for all dependencies is
not breaking encapsulation or exposing internals and is very appropriate
for constructor based injection. It adheres to best practices as set out
for OOP and TDD for both new code and legacy code. The key best practice
for this is to make sure multiple constructors use constructor chaining.

-Kirk

On Fri, Nov 5, 2021 at 11:58 AM Owen Nichols  wrote:

> @VisibileForTesting is bad.  It does NOT tell you if the method SHOULD
> HAVE BEEN private, or package-private, or protected, so there is no way to
> reconstruct the originally-intended code if the annotation is ever
> removed.  It also prevents the exposed methods from being named in a way
> that clearly indicates that user code should never call them.
>
> @TestOnly is good.  It tells you that for production, it is safe to remove
> the method entirely.  We might someday like automatically remove all
> @TestOnly methods when we ship official Geode releases -- not just to save
> bytes as Mark mentioned, but TO AVOID EXPOSING PRIVATE INTERALS (or worse,
> exposing the ability to manipulate and modify private internals!)
>
>
> ANY use of @VisibileForTesting CAN be rewritten to use @TestOnly instead,
> for example:
>
>   @VisibileForTesting
>   public Foo getInternalStuff() {}
>
> Could instead be:
>
>   private Foo getInternalStuff() {}
>
>   @TestOnly
>   public Foo getInternalStuffForTestOnly() {
> return getInternalStuff();
>   }
>
>
> That said, I understand that *having tests* is more valuable than having
> perfect encapsulation, so if the extra 4 lines of code needed to do the
> right thing is a disincentive to writing tests...I'd rather have the
> tests...
>
>
> On 11/5/21, 11:37 AM, "Kirk Lund"  wrote:
>
> I'm ok with not using these annotations, but we still need to write
> *unit
> tests* in addition to end-to-end *system tests*.
>
> I started this thread primarily to standardize on how we use one or
> both
> annotations. We shouldn't be using both arbitrarily without some sort
> of
> guidelines.
>
> If we want to actually stop using these annotations, then someone who
> feels
> strongly about removing them should volunteer to submit a pull request
> that
> *removes* all usages from the codebase. However, the purpose of these
> annotations is to document some important information about the code
> for
> better understanding by the developers working on it. So, please
> realize
> that we will be losing this information. Also, removing the annotations
> doesn't change the fact that we still need constructors and methods
> with
> scopes that allow for testing.
>
> Jinmei's description matches how I would want to use these annotations
> (if
> we want to use both):
>
> My understanding is @VisibileForTesting methods are used by the
> products,
> > while @TestOnly methods are used only by the tests.
> >
> > In practice, I don’t like to add @TestOnly methods (although I like
> to
> > mark those methods with this annotation if I found out a method is
> only
> > used for testing for better identification), but I do find it useful
> to add
> > @VisibleForTesting methods while making unit tests.
> >
>
> Dale's description represents a good usage of *@VisibleForTesting* for
> chaining constructors to inject dependencies for unit testing:
>
> Kirk and I often use @VisibleForTesting as part of a technique for
> making
> > complex code more testable.
> >
> > Many classes have “hidden” dependencies that make that class hard to
> test.
> > A useful technique is to add a constructor parameter to allow a test
> to
> > inject a more controllable, more observable instance.
> >
> > But we don’t want to force every use of the existing constructor to
> have
> > to create that instance. So we leave the old constructor’s signature
> as is,
> > create a new constructor that accepts that new parameter, and make
> the old
> > constructor call the new one.
> >
> > This new, intermediate constructor is called only by the old
> constructor
> > and by tests. In order for tests to call it, we have to make it
> visible, so
> > we mark it as @VisibleForTesting.
> >
>
> If a large constructor such as the one for *PartitionedRegion*
> constructs
> its own dependencies using constructors new Dependency(...) and static
> factories *Dependency.create*, then 

Re: @TestOnly or @VisibleForTesting

2021-11-05 Thread Kirk Lund
I'm ok with not using these annotations, but we still need to write *unit
tests* in addition to end-to-end *system tests*.

I started this thread primarily to standardize on how we use one or both
annotations. We shouldn't be using both arbitrarily without some sort of
guidelines.

If we want to actually stop using these annotations, then someone who feels
strongly about removing them should volunteer to submit a pull request that
*removes* all usages from the codebase. However, the purpose of these
annotations is to document some important information about the code for
better understanding by the developers working on it. So, please realize
that we will be losing this information. Also, removing the annotations
doesn't change the fact that we still need constructors and methods with
scopes that allow for testing.

Jinmei's description matches how I would want to use these annotations (if
we want to use both):

My understanding is @VisibileForTesting methods are used by the products,
> while @TestOnly methods are used only by the tests.
>
> In practice, I don’t like to add @TestOnly methods (although I like to
> mark those methods with this annotation if I found out a method is only
> used for testing for better identification), but I do find it useful to add
> @VisibleForTesting methods while making unit tests.
>

Dale's description represents a good usage of *@VisibleForTesting* for
chaining constructors to inject dependencies for unit testing:

Kirk and I often use @VisibleForTesting as part of a technique for making
> complex code more testable.
>
> Many classes have “hidden” dependencies that make that class hard to test.
> A useful technique is to add a constructor parameter to allow a test to
> inject a more controllable, more observable instance.
>
> But we don’t want to force every use of the existing constructor to have
> to create that instance. So we leave the old constructor’s signature as is,
> create a new constructor that accepts that new parameter, and make the old
> constructor call the new one.
>
> This new, intermediate constructor is called only by the old constructor
> and by tests. In order for tests to call it, we have to make it visible, so
> we mark it as @VisibleForTesting.
>

If a large constructor such as the one for *PartitionedRegion* constructs
its own dependencies using constructors new Dependency(...) and static
factories *Dependency.create*, then that prevents unit testing.

Untestable class:

*public UntestableClass() {*
*  // fetches its own dependencies*
*  dependency1 = new Dependency1();*

*  dependency2 = Dependency2.create(...);}*

Testable class:

*public TestableClass(Dependency1 dependency1, Dependency2 dependency2) {*
*  // all dependencies are passed in*
*  this.dependency1 = dependency1;*
*  this.dependency2 = dependency2;*
*}*

Now if you're trying to write unit tests for a class like
*PartitionedRegion*, the best technique is to use constructor chaining so
that the last constructor accepts all dependencies:

*public RefactoredClass() {*
*  this(new Dependency1(), Dependency2.create(...));*
*}*


*RefactoredClass(Dependency1 dependency1, Dependency2 dependency2) {*
*  // all dependencies are passed in*
*  ...*
*}*

Most of us have been applying *@VisibleForTesting* to the package-private
constructor that accepts all dependencies. Using the annotation doesn't
change the fact that we need to pass in the dependencies for unit testing.
The only other option is extremely large pull requests that change that
first constructor to require ALL dependencies.

-Kirk

On Thu, Nov 4, 2021 at 3:13 PM Kirk Lund  wrote:

> As everyone thinks about how we want to use these annotations, please keep
> this in mind that both *@VisibleForTesting* and *@TestOnly* can be used
> on Types (Class/Interface/Enum), Constructors, Methods and Fields. (not
> just Methods)
>
> On Thu, Nov 4, 2021 at 3:09 PM Kirk Lund  wrote:
>
>> Hey all,
>>
>> We're introducing a mess to the codebase. It's a small problem, but
>> several small problems become a big problem and one of my missions is to
>> clean up and improve the codebase.
>>
>> I recently started seeing lots of pull requests with usage of @TestOnly.
>> Sometimes it's used instead of @VisibleForTesting, while sometimes I see
>> both annotations added to the same method.
>>
>> Before we start using @TestOnly, I think we need some guidelines for when
>> to use @TestOnly versus @VisibleForTesting. Or if we're going to replace
>> @VisibleForTesting with @TestOnly, then we either need a PR for the
>> replacement or, at a minimum, deprecation annotation and javadocs added to
>> VisibleForTesting.java.
>>
>> The annotations appear similar but the javadocs describe slightly
>> different meanings for them...
>>
>> *@VisibleForTesting* was 

Re: Using Ports Safely in Tests

2021-11-04 Thread Kirk Lund
Dale, Thanks for doing this work and especially for adding these guidelines
to the Wiki!

On Thu, Nov 4, 2021 at 1:08 PM Dale Emery  wrote:

> As of July, 2021, Geode's build system no longer executes test worker
> processes in separate Docker containers.
>
> This increases the risk of port collisions between tests. Each test worker
> JVM and each Java process started by a test executes directly in the
> environment provided by the host machine. If a test is using a port, any
> concurrently running test that tries to bind to the same port number will
> suffer a BindException.
> Safely allocating ports. To reduce this risk, Geode's build system now
> assigns each test worker JVM a distinct range of ports. In CI, which runs
> 24 distributed tests concurrently, each test gets a range of about 400
> ports.
> In JVMs running Geode version 1.14 or later, AvailablePort and
> AvailablePortHelper will choose ports from this assigned range. As a
> result, no two tests will ever receive the same port number from these
> classes, no matter how many are running concurrently.
> When you write a test, keep the following guidelines in mind, to make sure
> your test uses only the ranges of ports that the Geode build system
> assigned to it:
>
>   *   To assign an available port, call AvailablePortHelper or
> AvailablePort. These are the only classes that know about the reduced port
> range for the test.
>   *   Call AvailablePort and AvailablePortHelper only in the current
> version of Geode. Do not call these classes in Child VMs running older
> versions of Geode. Older versions of Geode do not honor the new, reduced
> port ranges.
>   *   Do not launch any service using Geode’s default port for that
> service. Always explicitly assign an available port. The only safe use of
> default ports is in a test to verify that a service uses its default port
> by default. Note that such a test will fail the stress test CI job. To
> exclude such a test from the stress test job, annotate it with
> @Category(IgnoreInRepeatTestTasks.class).
>   *   Do not launch any service using a port number hard-coded into the
> test. Always explicitly assign an available port. There is no safe use of
> any hard-coded port number in tests that can run concurrently.
>   *   Do not attempt to reuse an ephemeral port. Some tests start a member
> on an ephemeral port, stop the member, and attempt to restart it on the
> same port. This is not safe. It was never safe. During the time between
> when the member stops and when it restarts, the port is available for the
> operating system to give to any other process. In CI, it is very, very
> likely that one or more concurrently-running tests (or other processes)
> will request ephemeral ports during the time when your test is not bound to
> it. If one of those processes gets the port your test was using, your test
> will fail.
> This information is available on the Geode wiki:
> https://cwiki.apache.org/confluence/display/GEODE/Using+Ports+Safely+in+Tests
>
> Dale
>
>


Re: @TestOnly or @VisibleForTesting

2021-11-04 Thread Kirk Lund
As everyone thinks about how we want to use these annotations, please keep
this in mind that both *@VisibleForTesting* and *@TestOnly* can be used on
Types (Class/Interface/Enum), Constructors, Methods and Fields. (not just
Methods)

On Thu, Nov 4, 2021 at 3:09 PM Kirk Lund  wrote:

> Hey all,
>
> We're introducing a mess to the codebase. It's a small problem, but
> several small problems become a big problem and one of my missions is to
> clean up and improve the codebase.
>
> I recently started seeing lots of pull requests with usage of @TestOnly.
> Sometimes it's used instead of @VisibleForTesting, while sometimes I see
> both annotations added to the same method.
>
> Before we start using @TestOnly, I think we need some guidelines for when
> to use @TestOnly versus @VisibleForTesting. Or if we're going to replace
> @VisibleForTesting with @TestOnly, then we either need a PR for the
> replacement or, at a minimum, deprecation annotation and javadocs added to
> VisibleForTesting.java.
>
> The annotations appear similar but the javadocs describe slightly
> different meanings for them...
>
> *@VisibleForTesting* was created in Geode several years ago to mean that
> the method is either only for testing or the visibility of it was widened
> (example: a private method might be widened to be package-private,
> protected or public). The method might be used by product code, but it also
> has widened scope specifically to allow tests to call it. The javadocs say:
>
> "Annotates a program element that exists, or is more widely visible than
> otherwise necessary, only for use in test code.
>
> Introduced while mobbing with Michael Feathers. Name and javadoc borrowed
> from Guava and AssertJ (both are Apache License 2.0)."
>
> *@TestOnly* started appearing when we added org.jetbrains.annotations
> dependency earlier this year. It seems to indicate a method that is ONLY
> used for tests (never called by product). The javadocs say:
>
> "A member or type annotated with TestOnly claims that it should be used
> from testing code only.
>
> Apart from documentation purposes this annotation is intended to be used
> by static analysis tools to validate against element contract violations.
>
> This annotation means that the annotated element exposes internal data and
> breaks encapsulation of the containing class; the annotation won't prevent
> its use from production code, developers even won't see warnings if their
> IDE doesn't support the annotation. It's better to provide proper API which
> can be used in production as well as in tests."
>
> So... when do we use one over the other? I don't think both annotations
> should be on the same method. Also, some sort of guidelines are needed if
> we're going to start using @TestOnly.
>


@TestOnly or @VisibleForTesting

2021-11-04 Thread Kirk Lund
Hey all,

We're introducing a mess to the codebase. It's a small problem, but several
small problems become a big problem and one of my missions is to clean up
and improve the codebase.

I recently started seeing lots of pull requests with usage of @TestOnly.
Sometimes it's used instead of @VisibleForTesting, while sometimes I see
both annotations added to the same method.

Before we start using @TestOnly, I think we need some guidelines for when
to use @TestOnly versus @VisibleForTesting. Or if we're going to replace
@VisibleForTesting with @TestOnly, then we either need a PR for the
replacement or, at a minimum, deprecation annotation and javadocs added to
VisibleForTesting.java.

The annotations appear similar but the javadocs describe slightly different
meanings for them...

*@VisibleForTesting* was created in Geode several years ago to mean that
the method is either only for testing or the visibility of it was widened
(example: a private method might be widened to be package-private,
protected or public). The method might be used by product code, but it also
has widened scope specifically to allow tests to call it. The javadocs say:

"Annotates a program element that exists, or is more widely visible than
otherwise necessary, only for use in test code.

Introduced while mobbing with Michael Feathers. Name and javadoc borrowed
from Guava and AssertJ (both are Apache License 2.0)."

*@TestOnly* started appearing when we added org.jetbrains.annotations
dependency earlier this year. It seems to indicate a method that is ONLY
used for tests (never called by product). The javadocs say:

"A member or type annotated with TestOnly claims that it should be used
from testing code only.

Apart from documentation purposes this annotation is intended to be used by
static analysis tools to validate against element contract violations.

This annotation means that the annotated element exposes internal data and
breaks encapsulation of the containing class; the annotation won't prevent
its use from production code, developers even won't see warnings if their
IDE doesn't support the annotation. It's better to provide proper API which
can be used in production as well as in tests."

So... when do we use one over the other? I don't think both annotations
should be on the same method. Also, some sort of guidelines are needed if
we're going to start using @TestOnly.


Re: Test failures on Windows with insufficient memory for the JRE while running distributed tests

2021-10-26 Thread Kirk Lund
PS: I should also mention that the *windows-gfsh-distributed* test target
is only run on Windows (never on Linux). It might be useful to try getting
windows-gfsh-distributed running on LInux to see if it hits the same issue
on that OS. This would also require some help from a pipeline expert.

On Tue, Oct 26, 2021 at 10:22 AM Kirk Lund  wrote:

> Hi Alberto,
>
> 32 kb is a very small amount of memory, so I don't think it's related to
> Java Heap. Based on what little I've read today, I think a failure in
> ChunkPool::allocate is probably related to either *running out of swap
> space or running out of address space in a 32 bit JVM*. Since the
> failures are OS specific, I would suspect the machine image we use for
> Windows to be involved.
>
> I also notice that this ChunkPool::allocate failure is only occurring for
> the Gfsh distributed tests which is the only job run on Windows that uses
> Gradle support for *JUnit Categories*. The Gradle target is
> distributedTest which we have configured with "*forkEvery 1*" which
> causes every test class to launch in a new JVM. Gradle implements JUnit
> 4 Category filtering by launching every test class to check the Categories
> and then either executes the tests or terminates without running any
> depending on the Categories.
>
> Some things I would check (or ask others about):
>
> *Is the harddrive space much smaller than what's available to the JVM(s)
> on Linux?*
>
> *Do the Gfsh distributed tests on Windows leave behind more artifacts on
> the harddrive than other test targets?*
>
> *Is it possible that the tests are using a 32-bit JVM on Windows? Or maybe
> the tests are spawning Gfsh process(es) using a 32-bit JVM instead of
> 64-bit?*
>
> *Are we running the Gfsh distributed tests in parallel (which might
> exacerbate harddrive swapping or memory consumption)?*
>
> Unfortunately, I don't know what most of the options in
> jinja.variables.yml are about. I think it would be best to get help from an
> expert in the OS images and pipeline details.
>
> Cheers,
> Kirk
>
> On Tue, Oct 26, 2021 at 12:59 AM Alberto Gomez 
> wrote:
>
>> Hi,
>>
>> I am having issues with insufficient memory for the Java Runtime
>> Environment when running some tests on the CI under Windows from the
>> following PR :
>> https://github.com/apache/geode/pull/7006
>>
>> The tests never fail under Linux.
>>
>> This is the error I get for some VMs:
>>
>> [vm4] # There is insufficient memory for the Java Runtime Environment to
>> continue.
>> [vm4] # Native memory allocation (malloc) failed to allocate 32744 bytes
>> for ChunkPool::allocate
>>
>> I have reduced the amount of resources used originally by the tests but
>> still I am not able to get a clean execution.
>>
>> I do not know if it is a matter of changing the parameters for the
>> windows execution in ci/pipelines/shared/jinja.variables.yml or if there is
>> anything else to consider.
>>
>> I would appreciate if someone from the community could help me
>> troubleshoot this issue.
>>
>> Thanks in advance,
>>
>> Alberto
>>
>>
>>


Re: Test failures on Windows with insufficient memory for the JRE while running distributed tests

2021-10-26 Thread Kirk Lund
Hi Alberto,

32 kb is a very small amount of memory, so I don't think it's related to
Java Heap. Based on what little I've read today, I think a failure in
ChunkPool::allocate is probably related to either *running out of swap
space or running out of address space in a 32 bit JVM*. Since the failures
are OS specific, I would suspect the machine image we use for Windows to be
involved.

I also notice that this ChunkPool::allocate failure is only occurring for
the Gfsh distributed tests which is the only job run on Windows that uses
Gradle support for *JUnit Categories*. The Gradle target is distributedTest
which we have configured with "*forkEvery 1*" which causes every test class
to launch in a new JVM. Gradle implements JUnit 4 Category filtering by
launching every test class to check the Categories and then either
executes the tests or terminates without running any depending on the
Categories.

Some things I would check (or ask others about):

*Is the harddrive space much smaller than what's available to the JVM(s) on
Linux?*

*Do the Gfsh distributed tests on Windows leave behind more artifacts on
the harddrive than other test targets?*

*Is it possible that the tests are using a 32-bit JVM on Windows? Or maybe
the tests are spawning Gfsh process(es) using a 32-bit JVM instead of
64-bit?*

*Are we running the Gfsh distributed tests in parallel (which might
exacerbate harddrive swapping or memory consumption)?*

Unfortunately, I don't know what most of the options in jinja.variables.yml
are about. I think it would be best to get help from an expert in the OS
images and pipeline details.

Cheers,
Kirk

On Tue, Oct 26, 2021 at 12:59 AM Alberto Gomez 
wrote:

> Hi,
>
> I am having issues with insufficient memory for the Java Runtime
> Environment when running some tests on the CI under Windows from the
> following PR :
> https://github.com/apache/geode/pull/7006
>
> The tests never fail under Linux.
>
> This is the error I get for some VMs:
>
> [vm4] # There is insufficient memory for the Java Runtime Environment to
> continue.
> [vm4] # Native memory allocation (malloc) failed to allocate 32744 bytes
> for ChunkPool::allocate
>
> I have reduced the amount of resources used originally by the tests but
> still I am not able to get a clean execution.
>
> I do not know if it is a matter of changing the parameters for the windows
> execution in ci/pipelines/shared/jinja.variables.yml or if there is
> anything else to consider.
>
> I would appreciate if someone from the community could help me
> troubleshoot this issue.
>
> Thanks in advance,
>
> Alberto
>
>
>


Re: Added Support for JUnit 5

2021-10-13 Thread Kirk Lund
Good job Dale and thanks!

On Tue, Oct 12, 2021 at 3:37 PM Dale Emery  wrote:

> In September 2021, Geode added support for writing and running tests using
> JUnit 5.
>
> Most Geode modules now support JUnit 5. For most Geode modules, you can
> now write each test class using either JUnit 5's "Jupiter" API or the
> legacy JUnit 4 API.
>
> Which modules support JUnit 5? Any source set that depends on geode-junit
> or geode-dunit already has JUnit 5 support. For those source sets you can
> start writing tests using the JUnit Jupiter API now, and Gradle will run
> them.
>
> To add JUnit 5 support to a module or source set: Add lines like these to
> the "dependencies" configuration of the module’s build.gradle file:
>
>
> testImplementation('org.junit.jupiter:junit-jupiter-api')
>
> testRuntimeOnly('org.junit.jupiter:junit-jupiter-engine')
>
> The first line allows you to write unit tests using the JUnit Jupiter API.
> The second line allows Gradle to run your JUnit Jupiter unit tests.
>
> To use JUnit Jupiter to write and run other kinds of tests (e.g.
> integrationTest or distributedTest), add similar lines to configure the
> appropriate source sets.
>
> LIMITATIONS
>
>   *   Because Geode support for JUnit Jupiter is so new, we have not yet
> added test framework code that takes advantage of its features.
>   *   JUnit Jupiter does not support the use of Rules.
>
> SEE ALSO
>
>   *   The JUnit 5 User Guide:
> https://junit.org/junit5/docs/current/user-guide/
>   *   Using JUnit 5 (a copy of this message on the Geode wiki):
> https://cwiki.apache.org/confluence/display/GEODE/Using+JUnit+5
>
> Dale Emery
>
>


The Apache Geode dedication to software quality

2021-09-03 Thread Kirk Lund
The Apache Geode project has built up a development process and community
standards since its initial start as an Apache podling in 2014 with the
purpose of releasing software of the highest quality developed by engineers
who believe in the project and its future. Our DOD (Definition of Done) may
be sparse and loose, but it captures some of the most important points that
have been discussed on the dev-list and agreed upon by this community. Its
essence is a focus on quality, both for users of Geode and developers who
will contribute and maintain the code base.

Our DOD [1] includes the following on developing code to be released:

CONTINUOUS DEVELOPMENT ACTIVITIES
+ Design reviewed on Wiki, especially for larger features
+ Implemented according to specification in the ticket
+ Unit tested
+ Integration tested
+ Public/ Peer-reviewed
+ Merged into 'develop'
+ JIRA Ticket marked 'resolved'

All committers must understand and follow our "development process and
community standards" [2] as discussed on the dev-list or captured on the
Apache Geode Wiki. Contributors who hope to actively contribute must learn
and follow this process. It's the responsibility of all committers to help
educate and guide these contributors in both our process and Code of
Conduct [3].

Everyone should be encouraged to ask questions and discuss how we can best
deliver high-quality software that will be easy to maintain by all Apache
Geode developers in the years to come.

Quality impassioned,
Kirk Lund

[1] https://cwiki.apache.org/confluence/display/GEODE/Definition+of+Done
[2] https://cwiki.apache.org/confluence/display/GEODE/Becoming+a+committer
[3] https://cwiki.apache.org/confluence/display/GEODE/Code+of+Conduct


Re: Revert GEODE-9486: Fix validate-serializable-objects (#6746)

2021-08-30 Thread Kirk Lund
PS: This PR needs a lot of reviews. Please review as soon as possible so I
can merge the revert to develop to restore the CI to green.

On Mon, Aug 30, 2021 at 2:42 PM Kirk Lund  wrote:

> I have submitted PR #6816 to temporarily revert GEODE-9486 until it's
> ready to go back in.
>
> https://github.com/apache/geode/pull/6816
>
> Sha 3b2c53 is failing consistently on Windows, so it needs to be reverted.
> I hope to have a fix ready sometime next week.
>
> Thanks,
> Kirk
>
>


Revert GEODE-9486: Fix validate-serializable-objects (#6746)

2021-08-30 Thread Kirk Lund
I have submitted PR #6816 to temporarily revert GEODE-9486 until it's ready
to go back in.

https://github.com/apache/geode/pull/6816

Sha 3b2c53 is failing consistently on Windows, so it needs to be reverted.
I hope to have a fix ready sometime next week.

Thanks,
Kirk


Re: Revert "GEODE-9369: Command to copy region entries from a WAN site to… #6811

2021-08-27 Thread Kirk Lund
I've merged the revert to develop. Thanks for all the reviews!

On Fri, Aug 27, 2021 at 9:56 AM Kirk Lund  wrote:

> Please review PR #6811 so we can revert GEODE-9369 to clear up the failing
> unit test. Hopefully this will be ready to be reintroduced next week.
>
> If anyone knows of follow-on commits after c9d465, please let me know.
>
> Revert "GEODE-9369: Command to copy region entries from a WAN site to…
> #6811
> https://github.com/apache/geode/pull/6811
>
> Thanks,
> Kirk
>
>


Revert "GEODE-9369: Command to copy region entries from a WAN site to… #6811

2021-08-27 Thread Kirk Lund
Please review PR #6811 so we can revert GEODE-9369 to clear up the failing
unit test. Hopefully this will be ready to be reintroduced next week.

If anyone knows of follow-on commits after c9d465, please let me know.

Revert "GEODE-9369: Command to copy region entries from a WAN site to… #6811
https://github.com/apache/geode/pull/6811

Thanks,
Kirk


Re: Revert GEODE-9408: Avoid duplicate events sent by Serial Gateway Sender when group-transaction-events is true (#6663)

2021-08-27 Thread Kirk Lund
That PR was reverting the wrong commit, so I've closed it. I'll submit a
new PR today to revert the introduction of WanCopyRegionFunction and its
unit test. I'm going to work with Alberto next week to address some
testability issues so it can be reintroduced as soon as possible.

Thanks,
Kirk

On Thu, Aug 26, 2021 at 4:56 PM Kirk Lund  wrote:

> I have submitted PR #6809 to temporarily revert GEODE-9408 until it's
> ready to go back in.
>
> https://github.com/apache/geode/pull/6809
>
> I've reviewed the code that went in with #b377e3, and I think we should
> revert it to rework WanCopyRegionFunction and its unit test. The unit test
> currently mocks three methods in WanCopyRegionFunction (known as partial
> mocking) which is a sign that dependencies are hidden inside the class and
> need to be pulled up to the constructor.
>
> WanCopyRegionFunction also has a static final ExecutorService which is
> never shutdown. The ExecutorService should live outside
> WanCopyRegionFunction, be passed in via the constructor, and be shutdown
> when the Cache closes.
>
> The intermittent test failures are not specific to Windows and are likely
> to also fail on a slow Linux machine.
>
> We really need to avoid Thread sleeps especially in unit tests. This and
> the partial mocking are signs that the class needs to change to better
> enable unit testing.
>
> I'll be more than happy to help but I think we need to revert it so we
> have time to discuss it without everyone else being affected by failing
> tests.
>
> Thanks,
> Kirk
>


Revert GEODE-9408: Avoid duplicate events sent by Serial Gateway Sender when group-transaction-events is true (#6663)

2021-08-26 Thread Kirk Lund
I have submitted PR #6809 to temporarily revert GEODE-9408 until it's ready
to go back in.

https://github.com/apache/geode/pull/6809

I've reviewed the code that went in with #b377e3, and I think we should
revert it to rework WanCopyRegionFunction and its unit test. The unit test
currently mocks three methods in WanCopyRegionFunction (known as partial
mocking) which is a sign that dependencies are hidden inside the class and
need to be pulled up to the constructor.

WanCopyRegionFunction also has a static final ExecutorService which is
never shutdown. The ExecutorService should live outside
WanCopyRegionFunction, be passed in via the constructor, and be shutdown
when the Cache closes.

The intermittent test failures are not specific to Windows and are likely
to also fail on a slow Linux machine.

We really need to avoid Thread sleeps especially in unit tests. This and
the partial mocking are signs that the class needs to change to better
enable unit testing.

I'll be more than happy to help but I think we need to revert it so we have
time to discuss it without everyone else being affected by failing tests.

Thanks,
Kirk


Re: Annual branch cleanup Aug 17

2021-08-09 Thread Kirk Lund
I confirmed that GEODE-9064-JMX-filter-support_1.12-02 is/was one of my
branches. I'm 100% for deleting it and the others. Thanks Owen!

On Sun, Aug 1, 2021 at 1:01 AM Owen Nichols  wrote:

> Reminder to use your personal fork whenever possible.  If you do create a
> branch in the geode repo, please clean it up promptly.
> An easy way to check is https://github.com/apache/geode/branches/yours
>
> I propose to delete the branches listed below on August 17 at 3pm.  If
> there’s any branch you want to keep for reference, please push it to your
> own fork before then.
> I will assume lazy consensus (no need to respond unless you disagree).
>
> I will delete the following branches:
> GEM-3288-network-interfaces
> revert-6429-feature/GEODE-9220-set
> feature/GEODE-6316
> feature/GEODE-6770
> feature/GEODE-7277
> feature/vSphereTests
> feature/GEODE-7109
> feature/GEODE-8118
> feature/GEODE-6901
> mass-test-run
> wip/throw-FDE-instead
> feature/GEODE-8324
> feature/GEODE-8444
> feature/change-readme-ownership
> feature/redis-memory-overhead
> feature/redis-performance-testing
> GEODE-9064-JMX-filter-support_1.12-02
>
> I will delete the following branches and close their associated PR (please
> push to your own fork and make a new PR from there before Aug 17):
> upthewaterspout-skip-checks
> expireAuthentication
> feature/GEODE-9191
> feature/GEODE-8278-2
>
> This branch has previously been approved by the community to remain in the
> geode repo and will NOT be deleted:
> feature/GEODE-7665
>
>


Re: "create region" cmd stuck on wan setup

2021-07-29 Thread Kirk Lund
Please file a bug for this. Even if that ordering doesn't work, nothing
should hang because of it.

Thanks,
Kirk

On Wed, Jul 28, 2021 at 5:27 PM Barrett Oglesby  wrote:

> I reproduced your issue with your scripts.
>
> They do:
>
> create gateway-receiver
> create disk-store
> create gateway-sender
> create region
>
> With that order, I see the hang you mentioned. I'm not 100% sure why that
> is happening but you can prevent it by reordering these elements.
>
> As Anil said, you should start your GatewayReceiver last like:
>
> create disk-store
> create gateway-sender
> create region
> create gateway-receiver
>
> With that order, cluster1 restarts fine.
>
> btw 1 - with the order you had regardless of the hang, you'll see lots of
> dropped WAN events since the region doesn't exist yet when the receiver is
> started:
>
> [info 2021/07/28 17:02:39.795 PDT server1_1  Processor2> tid=0x3c] The GatewayReceiver started on port : 5411
>
> [warn 2021/07/28 17:02:39.883 PDT server1_1  Thread 1> tid=0x4a] Server connection from
> [identity(192.168.1.7(server2_2:25891):41005,connection=1; port=52554]:
> Caught exception processing batch create request 0 for 100 events
> org.apache.geode.cache.RegionDestroyedException: Region /testregion was
> not found during batch create request 0
>
> btw 2 - I use CacheCreation.create to see the order that elements should
> be started. Thats the object that the old GemFire cache xml uses to start
> things in the right order.
>
> Barry
> 
> From: Anilkumar Gingade 
> Sent: Wednesday, July 28, 2021 3:45 PM
> To: dev@geode.apache.org 
> Subject: Re: "create region" cmd stuck on wan setup
>
> The recommendation with WAN setup is:
> - Create/start WAN Senders first
> - Create Regions
> - Create/Start WAN receivers last
>
> That way when wan receiver is started; the regions are created on all the
> sites. Sorry, I have not looked at your scripts...
>
> -Anil.
>
>
>
> On 7/28/21, 3:31 AM, "Alberto Bustamante Reyes"
>  wrote:
>
> Hi Geode devs,
>
> I have been analyzing an issue that occurs in the following scenario:
>
> 1) I start two Geode clusters (cluster1 & cluster2) with one locator
> and two servers each.
> Both clusters host a partitioned region called "testregion", which is
> replicated using a parallel gateway sender and a gateway receiver.
> These are the gfsh files I have been using for creating the clusters:
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Falb3rtobr%2Fe230623255632937fa68265f31e97f3adata=04%7C01%7Cboglesby%40vmware.com%7C6e6bff680f5d46c6bbcc08d952195ff7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637631091210347322%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=V%2Fnsqn8wiEnEpjf9GQZ4Ta38rPk5ha79RYqlZWZIXzY%3Dreserved=0
>
> 2) I run a client connected to cluster2 performing operations on
> testregion.
>
> 3) cluster1 is stopped and all persistent data is deleted. And then, I
> create cluster1 again.
>
> 4) At this point, the command to create "testregion" get stuck.
>
>
> After checking the thread stack and the code, I found that the problem
> is the following.
>
> This thread is trapped on an infinite loop waiting for a bucket
> primary election at "PartitionedRegion.waitForNoStorageOrPrimary":
>
>
> "Function Execution Processor4" tid=0x55
> java.lang.Thread.State: TIMED_WAITING
> at java.base@11.0.11/java.lang.Object.wait(Native Method)
> -  waiting on
> org.apache.geode.internal.cache.BucketAdvisor@28be7ae0
> at
> app//org.apache.geode.internal.cache.BucketAdvisor.waitForPrimaryMember(BucketAdvisor.java:1433)
> at
> app//org.apache.geode.internal.cache.BucketAdvisor.waitForNewPrimary(BucketAdvisor.java:825)
> at
> app//org.apache.geode.internal.cache.BucketAdvisor.getPrimary(BucketAdvisor.java:794)
> at
> app//org.apache.geode.internal.cache.partitioned.RegionAdvisor.getPrimaryMemberForBucket(RegionAdvisor.java:1032)
> at
> app//org.apache.geode.internal.cache.PartitionedRegion.getBucketPrimary(PartitionedRegion.java:9081)
> at
> app//org.apache.geode.internal.cache.PartitionedRegion.waitForNoStorageOrPrimary(PartitionedRegion.java:3249)
> at
> app//org.apache.geode.internal.cache.PartitionedRegion.getNodeForBucketWrite(PartitionedRegion.java:3234)
> at
> app//org.apache.geode.internal.cache.PartitionedRegion.shadowPRWaitForBucketRecovery(PartitionedRegion.java:10110)
> at
> app//org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderQueue.java:564)
> at
> app//org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderQueue.java:443)
> at
> 

Re: Member status using working directory

2021-07-29 Thread Kirk Lund
I filed https://issues.apache.org/jira/browse/GEODE-6841 to make this
change. We would remove our dependency on the Attach API and Geode would
then only use FileProcessController.

Once the Attach API is no longer used "status server --pid=###" will no
longer work, so I added a comment to GEODE-6841 indicating that the change
is a breaking change requiring us to wait until Geode 2.0.

I would expect --dir to only use FileProcessController, but the code you
linked clearly uses the pid. Please file a bug and assign it to me. I
should be able to fix that quickly.

Thanks,
Kirk


On Thu, Jul 22, 2021 at 3:47 AM Mario Salazar de Torres
 wrote:

> Hi,
>
> While doing some tests with cloud-native integration I noticed that if you
> try to run "status server --dir=" the logic doesn't work
> exactly as we would have expected.
>
> In this use case we are running the command in offline mode. Therefore
> ServerLauncher.ServersState.statusWithWorkingDirectory (
> https://github.com/apache/geode/blob/645d3530f8f6a22432afe27c3381fe318ed59654/geode-core/src/main/java/org/apache/geode/distributed/ServerLauncher.java#L1169
> ) is used to resolve the status command, which internally tries to read the
> PID member from the PID file and ultimately connect thru the JMX interface.
> Thing is that given the command is executed offline and only the working
> directory is specified, so I would expect that "FileProcessController"
> takes care of the request, rather than "MBeanProcessController", but this
> is not the case.
>
> To fully understand the motivation, you should know that within
> "MBeanProcessController" the JMX interface URL is obtained by attaching to
> the VM with its PID
> In our use case, the problem is that we have 2 containers, 1 for the
> member and another for a member watcher, both sharing the member workdir
> volume.
> And when we try to check the status from the watcher container the request
> fails because it tries to attach to the VM using a PID which is not on its
> PID namespace.
>
> So here are my questions:
>
>   *Is this logic working as you expected?
>   *   There is a reason behind this logic?
>   *   Would it be possible to change the logic, so every time "status
> server --dir=" in offline mode "FileProcessController" is
> used?
>
> Thanks,
> Mario
>


Re: NullPointerException while create region during server restart

2021-07-08 Thread Kirk Lund
Hi Mario,

I would guess that *getPdxRegistry()* is returning a null until after the
registry has finished initializing. Just a guess though.

Here's a spreadsheet
a
couple of us created and used as a reference for some work about a year and
a half ago. The source code line numbers probably aren't correct anymore,
but the order of steps and general details should still be accurate. As
you'll see, the PDX region (aka the PDX registry) is created at step 19 of
the spreadsheet.

Step 26 is the creation of the CacheServerMXBean.
Step 29 marks 'Online' status change.

You need to wait for all servers to reach 'Online' on Step 29 of the
spreadsheet before making any changes like creating regions.

To understand how to identify 'Online', take a look at these two acceptance
tests:
1.
geode-assembly/src/acceptanceTest/java/org/apache/geode/launcher/ServerStartupOnlineTest.java
2.
geode-assembly/src/acceptanceTest/java/org/apache/geode/launcher/ServerStartupNotificationTest.java

-Kirk

On Tue, Jul 6, 2021 at 12:06 AM Mario Kevo  wrote:

> Hi Geode devs,
>
> I opened a new ticket https://issues.apache.org/jira/browse/GEODE-9409
> regarding NullPointerException on creating region while one of the servers
> is restarting.
> If we run the "create region" command through gfsh while the server is
> starting it passed, but if the server is restarted then it fails. The
> difference is that when we restarted the server, we kill them and start
> again. As it has already a server directory, it takes more time to get the
> server up as expected.
> In that case, if we run the "create region" command it can happen that the
> cache is not fully created and we are trying to do something on that. That
> can lead to the NullPointerException, as creating region catches
> pdxRegistry from the cache while doing findDiskStore, but sometimes it is
> not initialized in the cache yet. So every method run against that will
> throw NullPoniterException.
> There is a part of the code where the exception is thrown:
>
> DiskStoreImpl findDiskStore(RegionAttributes regionAttributes,
> InternalRegionArguments internalRegionArgs) {
>   // validate that persistent type registry is persistent
>   if (getAttributes().getDataPolicy().withPersistence()) {
> getCache().getPdxRegistry().creatingPersistentRegion();
>   }
>
> As I already mention, getPdxRegistry(LocalRegion.java) will be null if it
> is not yet initialized in create(CacheCreation.java):
>
> DiskStoreAttributesCreation pdxRegDSC = initializePdxDiskStore(cache);
>
> cache.initializePdxRegistry();
>
> createDiskStores(cache, pdxRegDSC);
>
> I tried to do some fixes, but without a success. 
> It can be passed if we add some retry and sleep, but that is not
> acceptable.
>
> So if someone has some idea how to do some wait until pdxRegistry is
> initialized or something else what will help us to avoid this problem?
>
> BR,
> Mario
>


ParallelGatewaySenderQueue$BatchRemovalThread prints NPE

2021-06-24 Thread Kirk Lund
Can someone who has been working on GatewaySender please take a look at
this NPE?

It didn't cause any test failures, but it does show up in the output of my
unit-test-openjdk11 task for one of my PRs:
https://concourse.apachegeode-ci.info/builds/53189

> Task :extensions:geode-modules-tomcat8:testClasses




*> Task :geode-web:testException in thread "BatchRemovalThread for
GatewaySender_sender_2" java.lang.NullPointerException at
org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue$BatchRemovalThread.checkCancelled(ParallelGatewaySenderQueue.java:1835)
at
org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue$BatchRemovalThread.run(ParallelGatewaySenderQueue.java:1936)*>
Task :extensions:geode-modules-tomcat9:compileTestJava
> Task :extensions:geode-modules-tomcat9:testClasses

Thanks,
Kirk


Re: [DISCUSS] Remove stress-new-test-openjdk11 requirement from PRs

2021-06-15 Thread Kirk Lund
My preference would be to have ALL test targets be mandatory to merge any
PR, but in my opinion that would require some sort of plan to fix all
flickering tests. Currently, we have a person assigned to monitor CI
everyday who then files Jira tickets against any test that intermittently
fails, but presently the vast majority of those Jira tickets remain
unassigned and are never fixed.

A couple of options right now. Make both stress-new-tests required for
merge:

1) but only for new tests

2) with zero tolerance for flickering tests -- which means all PRs not
directly associated with fixing flickering tests are frozen indefinitely
until ALL Jira tickets for flickering tests are fixed, and this process is
then repeated every time we have an intermittent failure in CI

-Kirk

On Thu, Jun 10, 2021 at 10:56 AM Dan Smith  wrote:

> I'm also -0 on this one. I personally kinda like having the merge button
> disabled on my PRs until all the checks are passing so I don't have to
> think as much, and I'm not sure this problem is going to come up again
> soon. But I'm willing go with disabling it.
>
> -Dan
> 
> From: Mark Hanson 
> Sent: Thursday, June 10, 2021 10:15 AM
> To: dev@geode.apache.org 
> Subject: Re: [DISCUSS] Remove stress-new-test-openjdk11 requirement from
> PRs
>
> +1 to new report from Owen
> +1 to re-introduce the stress-test
>
> Great ideas Naba!
>
> On 6/10/21, 9:50 AM, "Nabarun Nag"  wrote:
>
> Hi all,
>
>
>   *   We need to discuss how to prevent more flaky tests to be
> introduced now that stress-test is not mandatory for PRs to be merged?
> Reviewers checking the PR must check the tests failing in stress test and
> if it is a test that has been introduced or changed in the PR, the PR must
> be blocked with a change request or rejected.
>   *   Also, in my opinion, we need to re-introduce the stress test as
> a mandatory check for PRs to be merged once the flaky test percentage has
> been reduced.
>
> Owen, will it be possible to put out a list of all the flaky tests and
> we can try to get these resolved collectively as a community. (results of
> the mass tests maybe). Thank you.
>
>
> Regards
> Nabarun Nag
> 
> From: Kirk Lund 
> Sent: Thursday, June 10, 2021 9:37 AM
> To: dev@geode.apache.org 
> Subject: Re: [DISCUSS] Remove stress-new-test-openjdk11 requirement
> from PRs
>
> Ok, I wanted to give this discussion another night and we still have
> consensus for making both stress-new-test non-required.
>
> I just filed PR #6602 <
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull%2F6602data=04%7C01%7Cdasmith%40vmware.com%7C58dd1b5852174343eebb08d92c335953%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637589421336670457%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=%2Fuc8WXORNjjYjFKNgGjqMdS%2FhK9FGFBepNCNqmfp1GQ%3Dreserved=0>
> to change
> stress-new-test from required to non-required. Please review!
>
> On Wed, Jun 9, 2021 at 2:11 PM Anthony Baker 
> wrote:
>
> > If we have consensus, no need to VOTE.
> >
> > > On Jun 9, 2021, at 12:52 PM, Owen Nichols 
> wrote:
> > >
> > > Ok, I'm on board with changing stress-new-test from a required PR
> check
> > to non-required.  It's simple, codeowners still have the final say,
> and it
> > neatly avoids the whole topic of overrides.  Time to take a [VOTE]?
> > >
> >
> >
>
>


Re: [DISCUSS] Remove stress-new-test-openjdk11 requirement from PRs

2021-06-10 Thread Kirk Lund
Ok, I wanted to give this discussion another night and we still have
consensus for making both stress-new-test non-required.

I just filed PR #6602  to change
stress-new-test from required to non-required. Please review!

On Wed, Jun 9, 2021 at 2:11 PM Anthony Baker  wrote:

> If we have consensus, no need to VOTE.
>
> > On Jun 9, 2021, at 12:52 PM, Owen Nichols  wrote:
> >
> > Ok, I'm on board with changing stress-new-test from a required PR check
> to non-required.  It's simple, codeowners still have the final say, and it
> neatly avoids the whole topic of overrides.  Time to take a [VOTE]?
> >
>
>


Re: [DISCUSS] Remove stress-new-test-openjdk11 requirement from PRs

2021-06-09 Thread Kirk Lund
I do like the suggestions offered up by Dale and would encourage (or even
plead) with my fellow contributors to consider these:

  *   Allow code owners to override the block, if they can be convinced the
> override is justified.
>   *   Exclude troublesome tests from stress test runs, either via
> annotations or via an `assumeThat(…)` that can detect that it’s running as
> a stress test. Whatever the mechanism for excluding, it would be in the
> code, and therefore subject to code owner review. (This, too, feels overly
> broad to me, as it would exclude the test from all stress test runs.)
>   *   A way to exclude a specific test method from running in the stress
> tests for a specific PR or commit. I don’t have any ideas for how to
> declare such an exclusion, but if it could be done via a file it would be
> subject to code owner review.


1) Allow code owners to override the block, if they can be convinced the
override is justified.

After all, if we don't trust our code owners...

2-3) Use a custom annotation to exclude the test method or test class only
from stress-new-test.

At first I really liked this idea, but then we end up with growing a
collection of flaky tests that are excluded in some way from
stress-new-test that still occasionally fail in distributedTest.

#1 really sounds like the best option to me. I believe that leaving our
stress-new-test process as-is will only discourage everyone from fixing one
or two flaky tests in a large dunit test. However, I also believe that if
we give code owners the authority to override stress-new-test, then we
need to encourage them not to override this too often.

On Tue, Jun 8, 2021 at 11:11 AM Dale Emery  wrote:

> Maybe we can find a way to relax the requirement, or to allow addressing
> specific situations like the tangle you find yourself in.
>
> Removing the requirement altogether feels overly broad. I fear it would
> allow us to quietly disregard all intermittent test failures, and I think
> we already quietly (or even actively) disregard way too many kinds of
> failures.
>
> I would prefer some way to explicitly disregard only the specific test
> failures that prevent us from merging, and only with some amount of
> explicit justification.
>
> I’m not sure what that would look like. Some half-baked possibilities:
>
>   *   Allow code owners to override the block, if they can be convinced
> the override is justified.
>   *   Exclude troublesome tests from stress test runs, either via
> annotations or via an `assumeThat(…)` that can detect that it’s running as
> a stress test. Whatever the mechanism for excluding, it would be in the
> code, and therefore subject to code owner review. (This, too, feels overly
> broad to me, as it would exclude the test from all stress test runs.)
>   *   A way to exclude a specific test method from running in the stress
> tests for a specific PR or commit. I don’t have any ideas for how to
> declare such an exclusion, but if it could be done via a file it would be
> subject to code owner review.
>
> Dale
>
> From: Kirk Lund 
> Date: Tuesday, June 8, 2021 at 9:33 AM
> To: dev@geode.apache.org 
> Subject: [DISCUSS] Remove stress-new-test-openjdk11 requirement from PRs
> Our requirement for stress-new-test-openjdk11 to pass before allowing merge
> doesn't really work as intended for fixing distributed tests that contain
> multiple flaky test methods. In fact, I think it causes contributors to
> avoid tackling flaky tests.
>
> I've been working on GEODE-9103: CI Failure:
> PutAllClientServerDistributedTest.testPutAllReturnsExceptions FAILED
> <https://issues.apache.org/jira/browse/GEODE-9103> and was able to fix it.
>
> However, stress-new-test-openjdk11 then continued to fail for other flaky
> tests in PutAllClientServerDistributedTest. I managed to fix GEODE-9296 and
> GEODE-8528 as well. I also tried but have not been able to fix GEODE-9242
> which remains flaky.
>
> Unfortunately, I cannot merge any of my fixes for
> PutAllClientServerDistributedTest unless every single flaky test in it is
> fixed by my PR. I think this is undesirable because it would be better to
> merge the fix for 3 flaky test methods than none.
>
> UPDATE: After running my precheckin more times, I did get
> stress-new-test-openjdk11 to pass once so I can merge, but that's more of a
> loophole than anything because I didn't manage to fix GEODE-9242.
>
> Despite having PR #6542 eventually pass, I would like to discuss removing
> or relaxing our requirement that stress-new-test-openjdk11 must pass in
> order to merge a PR...
>
> PR #6542: GEODE-9103: Fix ServerConnectivityExceptions in
> PutAllClientServerDistributedTest
> <
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull

Re: [DISCUSS] Remove stress-new-test-openjdk11 requirement from PRs

2021-06-09 Thread Kirk Lund
I did think about splitting up dunit tests, but I believe
testEventIdOutOfOrderInPartitionRegionSingleHop will remain flaky even if I
move it to a new dunit test. No matter how you dice it up, we end up with a
PR that cannot be merged to develop unless you get lucky after running
stress-new-test many times.

One could try being tricky by marking it with @ignore or deleting the flaky
test in one PR, and then re-add it in a 2nd PR. But, even then that 2nd PR
is very unlikely to pass stress-new-test unless someone fixes the cause of
that test's flakiness.

As it stands, stress-new-test just ends up being a dead-end or an endless
time-sync for fixing multiple flaky tests in one dunit.

On Tue, Jun 8, 2021 at 12:09 PM Dan Smith  wrote:

> Would it be possible to just split that test up into multiple classes? It
> sounds like the issue is that there is so many flaky tests in that class
> that you can't fix them all in one PR, which might indicate it's too big.
>
> If we can't get StressNewTest to pass - that means our builds are failing
> more than 2% of the time due to this one test failure. Yikes!
>
> -Dan
> ____
> From: Kirk Lund 
> Sent: Tuesday, June 8, 2021 9:33 AM
> To: dev@geode.apache.org 
> Subject: [DISCUSS] Remove stress-new-test-openjdk11 requirement from PRs
>
> Our requirement for stress-new-test-openjdk11 to pass before allowing merge
> doesn't really work as intended for fixing distributed tests that contain
> multiple flaky test methods. In fact, I think it causes contributors to
> avoid tackling flaky tests.
>
> I've been working on GEODE-9103: CI Failure:
> PutAllClientServerDistributedTest.testPutAllReturnsExceptions FAILED
> <
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9103data=04%7C01%7Cdasmith%40vmware.com%7Cb7b22c48e3584f306fa408d92a9b36bd%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587668391258409%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=9EmtMx3BhTe2zoLdUMOjCEblldw0VigUMnK3O2Ia%2FRY%3Dreserved=0>
> and was able to fix it.
>
> However, stress-new-test-openjdk11 then continued to fail for other flaky
> tests in PutAllClientServerDistributedTest. I managed to fix GEODE-9296 and
> GEODE-8528 as well. I also tried but have not been able to fix GEODE-9242
> which remains flaky.
>
> Unfortunately, I cannot merge any of my fixes for
> PutAllClientServerDistributedTest unless every single flaky test in it is
> fixed by my PR. I think this is undesirable because it would be better to
> merge the fix for 3 flaky test methods than none.
>
> UPDATE: After running my precheckin more times, I did get
> stress-new-test-openjdk11 to pass once so I can merge, but that's more of a
> loophole than anything because I didn't manage to fix GEODE-9242.
>
> Despite having PR #6542 eventually pass, I would like to discuss removing
> or relaxing our requirement that stress-new-test-openjdk11 must pass in
> order to merge a PR...
>
> PR #6542: GEODE-9103: Fix ServerConnectivityExceptions in
> PutAllClientServerDistributedTest
> <
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull%2F6542data=04%7C01%7Cdasmith%40vmware.com%7Cb7b22c48e3584f306fa408d92a9b36bd%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587668391258409%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=Z4WRZBNikFsIEDQuDmEpKwsZO2WqLATudaMix%2BDrfMs%3Dreserved=0
> >
>
> Fixed in PR #6542:
> * GEODE-9296: CI Failure: PutAllClientServerDistributedTest >
> testPartialKeyInPRSingleHopWithRedundancy
> <
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9296data=04%7C01%7Cdasmith%40vmware.com%7Cb7b22c48e3584f306fa408d92a9b36bd%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587668391263407%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=%2FiNd3TORn9Al4Y%2BTwbzMmfy3jB7%2F5XxNibhtWtiCOfM%3Dreserved=0
> >
> * GEODE-9103: CI Failure:
> PutAllClientServerDistributedTest.testPutAllReturnsExceptions FAILED
> <
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9103data=04%7C01%7Cdasmith%40vmware.com%7Cb7b22c48e3584f306fa408d92a9b36bd%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587668391263407%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=9LiNnczBArXP%2FoNjyyhRRgDnpJqWgMuQtYRwzscQ2TQ%3Dreserved=0
> >
> * GEODE-8528: PutAllClientServerDistributedTest.testPartialKeyInPRSingleHop
> fails due to missing disk store after server restart
> <
> htt

[DISCUSS] Remove stress-new-test-openjdk11 requirement from PRs

2021-06-08 Thread Kirk Lund
Our requirement for stress-new-test-openjdk11 to pass before allowing merge
doesn't really work as intended for fixing distributed tests that contain
multiple flaky test methods. In fact, I think it causes contributors to
avoid tackling flaky tests.

I've been working on GEODE-9103: CI Failure:
PutAllClientServerDistributedTest.testPutAllReturnsExceptions FAILED
 and was able to fix it.

However, stress-new-test-openjdk11 then continued to fail for other flaky
tests in PutAllClientServerDistributedTest. I managed to fix GEODE-9296 and
GEODE-8528 as well. I also tried but have not been able to fix GEODE-9242
which remains flaky.

Unfortunately, I cannot merge any of my fixes for
PutAllClientServerDistributedTest unless every single flaky test in it is
fixed by my PR. I think this is undesirable because it would be better to
merge the fix for 3 flaky test methods than none.

UPDATE: After running my precheckin more times, I did get
stress-new-test-openjdk11 to pass once so I can merge, but that's more of a
loophole than anything because I didn't manage to fix GEODE-9242.

Despite having PR #6542 eventually pass, I would like to discuss removing
or relaxing our requirement that stress-new-test-openjdk11 must pass in
order to merge a PR...

PR #6542: GEODE-9103: Fix ServerConnectivityExceptions in
PutAllClientServerDistributedTest


Fixed in PR #6542:
* GEODE-9296: CI Failure: PutAllClientServerDistributedTest >
testPartialKeyInPRSingleHopWithRedundancy

* GEODE-9103: CI Failure:
PutAllClientServerDistributedTest.testPutAllReturnsExceptions FAILED

* GEODE-8528: PutAllClientServerDistributedTest.testPartialKeyInPRSingleHop
fails due to missing disk store after server restart


Still flaky:
* GEODE-9242: CI failure in PutAllClientServerDistributedTest >
testEventIdOutOfOrderInPartitionRegionSingleHop


Thanks,
Kirk


Re: [Discuss] New feature work approval state in Geode project?

2021-05-28 Thread Kirk Lund
Anil's thoughts parallel what I was trying to describe as well...

On Fri, May 28, 2021 at 2:35 PM Anilkumar Gingade 
wrote:

> My thoughts; I can't make distinction between feature or bug; it’s a
> change to the codebase, if it has greater impact, is sensitive and takes
> time to build; then it is a candidate to bring it up and talk about it
> before implementation. Sometime its hard to determine/distinguish it, we
> developer should make a good judgement of it and be open for any
> suggestion. Currently we have a RFC process; adding more process/steps adds
> additional onus; we can tweak RFC process or replace it with better option
> but let's keep it simple/minimal.
>
>
> On 5/28/21, 11:57 AM, "Jacob Barrett"  wrote:
>
>
>
> > On May 28, 2021, at 11:24 AM, Mark Hanson 
> wrote:
> >
> > I think the key difference between what Bill and Jake are saying is
> that Bill is saying a new feature needs approval in a more structured way.
> I think Bill's process is open the jira, then it is "approved" or "won't
> do" then work starts. I think what Jake is saying is a little less
> structured. That may be my reading though.
>
> The difference is between bugs vs features. We have a process for
> features, lazy consensus on RFCs. We have a process for minor
> features/improvements/tasks, greedy concensus on PR approvals.
>
> -Jake
>
>
>


Re: [Discuss] New feature work approval state in Geode project?

2021-05-28 Thread Kirk Lund
Another thought I have is that if the "fix" for a bug will require any sort
of new User API or configuration properties (ie gemfire.properties), then
we should always treat that as a new feature that requires going through
our RFC process for a new feature.

On Fri, May 28, 2021 at 2:42 PM Kirk Lund  wrote:

> I think the main problem this discussion thread could solve is when a
> developer files a Jira ticket as a bug, assigns it to themselves, and
> begins work on it. After some time (possibly even a month or something
> significant like that), they submit a PR to fix the ticket, only to find
> out from multiple reviewers that the community believes it is not a bug and
> rejects the PR while declaring that the bug isn't even a bug. The ticket is
> then closed as "will not fix" and the contributor is thoroughly demoralized
> (I would be).
>
> This community does seem to be lacking any sort of process for reviewing
> and triaging bugs unless the filer of the ticket were to discuss it on the
> dev-list after filing it but before coding starts.
>
> So, for bugs, this community could recommend optional discussion of
> newly filed bugs on the dev-list before beginning work on it to ensure that
> the majority don’t later disagree that it’s a bug during PR review. This
> would save the contributor a lot of wasted effort, or they might even be
> able to convince others that it is indeed a bug before starting work on it.
>
> -Kirk
>
> On Fri, May 28, 2021 at 11:14 AM Jacob Barrett 
> wrote:
>
>>
>>
>> > On May 28, 2021, at 10:48 AM, Bill Burcham 
>> wrote:
>> >
>> > A google search for "most successful open source project" lists Apache
>> > Cassandra at the top. When a bug is submitted to the Cassandra Jira it
>> > starts in the TRIAGE NEEDED state. Ostensibly after triage, it moves to
>> the
>> > OPEN state which is described as:
>> >
>> > | "the issue is open and ready for the assignee to start work on it."
>> >
>> > I think introducing a starting state like TRIAGE NEEDED in the Geode
>> Jira
>> > system, and instituting a (gating) triage process would be a good way to
>> > keep the Geode architecture cohesive and also to maximize the impact of
>> our
>> > contributor's efforts.
>>
>>
>> I think this is very different from what this thread was initially asking
>> for. An “approved” state is very different in my book than a bug needing
>> more info, “triage” state. I would definitely agree with this change to add
>> a “needs info”/“triage” state to bugs. It fits my previous statement that a
>> ticket is opened when there is reasonable belief work needs to be done. In
>> this case the initial work is to fully understand the issue. The next step
>> is to progress to fixing or closed.
>>
>> -Jake
>>
>>


Re: Cleaning up the codebase - use curly braces everywhere

2021-05-28 Thread Kirk Lund
+1 Also, I think if the entire work is done by IntelliJ then just mention
that in the PR description and I'm happy to add approval without stepping
through every line. I'd probably just pick a dozen random ones to check and
then approve it.

On Thu, May 27, 2021 at 6:36 PM Darrel Schneider  wrote:

> +1 thanks for working on this
> 
> From: Donal Evans 
> Sent: Thursday, May 27, 2021 10:22 AM
> To: dev@geode.apache.org 
> Subject: Cleaning up the codebase - use curly braces everywhere
>
> Hi Geode dev,
>
> I've recently been looking at ways to improve code quality in small ways
> throughout the codebase, and as a starting point, I thought it would be
> good to make it so that we're consistently using curly braces for control
> flow statements everywhere, since this is something that's specifically
> called out in the Geode Code Style Guide wiki page[1] as one of the "more
> important points" of our code style.
>
> IntelliJ has a "Run inspection by name..." feature that makes it possible
> to identify all places where curly braces aren't used for control flow
> statements, (which showed over 3300 occurrences in the codebase) and also
> allows them to be automatically inserted, making the fix relatively
> trivial. Since this PR will touch 640 files, I wanted to make sure to first
> check that this is something even worth doing, and, if there's agreement
> that it is, to give reviewers context on what the changes are, the
> motivation for them, and how they were made, to help with the review
> process.
>
> The draft PR I have up[2] currently has no failing tests and can be marked
> as ready to review if there aren't any objections, and once it is, I'll try
> to coordinate with codeowners to get the minimal number of approvals
> required for a merge (it looks like only 6-7 reviewers are needed, though
> I'm sure that almost every code owner will be tagged as reviewers given the
> number of files touched).
>
> If this idea is a success, I think it would be good to have a discussion
> about other low-hanging code improvements we could make using static
> analysis (unnecessary casts, unused variables, duplicate conditions etc.),
> and, once a particular inspection has been "fixed," possibly consider
> adding a check for it as part of the PR pre-checkin to make sure it's not
> reintroduced. All thoughts and feedback are very welcome.
>
> Donal
>
> [1]
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FCode%2BStyle%2BGuidedata=04%7C01%7Cdarrel%40vmware.com%7C55c1f47da50548d3fa5708d92134034c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637577329548326057%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=BVvzQAdQ24ZuoOdoMrkGNJShmTkep4BGFduSAQu5H6o%3Dreserved=0
> [2]
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull%2F6523data=04%7C01%7Cdarrel%40vmware.com%7C55c1f47da50548d3fa5708d92134034c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637577329548326057%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=gzta%2FwrhapOzp0DKcMEn1IR5XsNboWuMlUJwUwOrcGc%3Dreserved=0
>


Re: [Discuss] New feature work approval state in Geode project?

2021-05-28 Thread Kirk Lund
I think the main problem this discussion thread could solve is when a
developer files a Jira ticket as a bug, assigns it to themselves, and
begins work on it. After some time (possibly even a month or something
significant like that), they submit a PR to fix the ticket, only to find
out from multiple reviewers that the community believes it is not a bug and
rejects the PR while declaring that the bug isn't even a bug. The ticket is
then closed as "will not fix" and the contributor is thoroughly demoralized
(I would be).

This community does seem to be lacking any sort of process for reviewing
and triaging bugs unless the filer of the ticket were to discuss it on the
dev-list after filing it but before coding starts.

So, for bugs, this community could recommend optional discussion of
newly filed bugs on the dev-list before beginning work on it to ensure that
the majority don’t later disagree that it’s a bug during PR review. This
would save the contributor a lot of wasted effort, or they might even be
able to convince others that it is indeed a bug before starting work on it.

-Kirk

On Fri, May 28, 2021 at 11:14 AM Jacob Barrett  wrote:

>
>
> > On May 28, 2021, at 10:48 AM, Bill Burcham 
> wrote:
> >
> > A google search for "most successful open source project" lists Apache
> > Cassandra at the top. When a bug is submitted to the Cassandra Jira it
> > starts in the TRIAGE NEEDED state. Ostensibly after triage, it moves to
> the
> > OPEN state which is described as:
> >
> > | "the issue is open and ready for the assignee to start work on it."
> >
> > I think introducing a starting state like TRIAGE NEEDED in the Geode Jira
> > system, and instituting a (gating) triage process would be a good way to
> > keep the Geode architecture cohesive and also to maximize the impact of
> our
> > contributor's efforts.
>
>
> I think this is very different from what this thread was initially asking
> for. An “approved” state is very different in my book than a bug needing
> more info, “triage” state. I would definitely agree with this change to add
> a “needs info”/“triage” state to bugs. It fits my previous statement that a
> ticket is opened when there is reasonable belief work needs to be done. In
> this case the initial work is to fully understand the issue. The next step
> is to progress to fixing or closed.
>
> -Jake
>
>


Re: [VOTE] Requiring final keyword on every parameter and local variable

2021-04-22 Thread Kirk Lund
It's past the announced deadline so I've counted the results of our vote to
require 'final' keyword on all local variables and method/constructor
parameters.

I am interpreting this vote as a procedural issue [1], and I counted votes
by PMC members as "binding votes" and votes by all other participants as
"advisory votes" [2]. Please correct me if I've done any of this
incorrectly, and we'll try to get the results fixed.

Given that I stumbled during the beginning and started the VOTE before the
DISCUSS, I'm happy to start the process over if anyone believes that we
should repeat it. I've taken notes and I'm convinced I'll follow the
process correctly next time.

Voting status
=
+1: 2 binding votes
* Bill Burcham (PMC member)
* Jacob Barrett (PMC member)

+0: zero votes

-0: 1 binding vote, 1 advisory vote
* Jason Huynh (PMC member)
* Dale Emery (committer)

-1: 4 binding votes, 1 advisory vote
* Mark Hanson (PMC member)
* Patrick Johnson (contributor)
* Udo Kohlmeyer (PMC member)
* Kirk Lund (PMC member)
* John Blum (PMC member)
* Bruce Schuchardt (PMC member) - sorry, vote was past deadline

The voting fails with a majority of -1 votes.

[1] https://www.apache.org/foundation/voting.html
[2] https://projects.apache.org/committee.html?geode

Thanks everyone,
Kirk

(PS: should this result email be part of the original VOTE thread or should
it start a new thread for results?)

On Thu, Apr 22, 2021 at 9:53 AM Kirk Lund  wrote:

> Voting and discussion are closed for now. I'll count up the votes and post
> a summary soon.
>
> -Kirk
>
> On Thu, Apr 22, 2021 at 7:54 AM Bruce Schuchardt 
> wrote:
>
>> Kirk speaks my mind.  -1 on this requirement
>>
>> On 4/21/21, 11:29 AM, "Kirk Lund"  wrote:
>>
>> -1 as I've already explained in my DISCUSS thread for this topic, I
>> don't
>> think final gives sufficient benefit for local variables or method
>> parameters because it only prevents reassigning object pointers and
>> primitive values.
>>
>> I favor individuals making a personal decision on whether or not they
>> want
>> to use 'final' in this way without requiring others to do the same
>> (and NOT
>> requiring this use of 'final' in PR reviews, although "suggesting" is
>> ok).
>>
>> Note that JDK and all other libraries I looked through do not use
>> final in
>> this way.
>>
>> *I'm willing to change my vote to -0 if*:
>>
>>1. we qualify the requirement to only apply to method and
>> constructor
>>parameters (not local vars)
>>2. we find a way to introduce automation for enforcement (such as
>> we do
>>with RAT for license headers)
>>
>> *I'm willing to change my vote to +1 if *we decide to require 'final'
>> parameters (and even local vars) as part of a bigger shift towards
>> improving code quality by embracing Clean Code (ala Bob Martin):
>>
>>1. we commit to developing a robust and thorough coding standard
>> that
>>includes requiring true unit tests and passing in dependencies via
>> class
>>constructors or methods
>>2. we revisit our DOD and enforce it in PR reviews
>>
>> I recommend that contributors use IntelliJ inspections (or comparable
>> guideline for anyone who doesn't want to use IntelliJ) that will
>> assist us
>> in avoiding and fixing long methods when writing new methods and when
>> editing/refactoring existing methods (see below)
>> Method metrics (see IntelliJ > Preferences > Editor > Inspections):
>>
>>- Method with too many parameters
>>- Overly complex method
>>- Overly long method
>>- Overly nested method
>>
>> The inspections can be customized for Geode as long as the values
>> aren't
>> increased to the point of it being useless. For example, we would
>> have to
>> agree as a community what the "maximum number of parameters" is for
>> the
>> "Method with too many parameters" inspection.
>>
>> We could even export an inspections profile to be stored in the Geode
>> repo
>> and used by contributors.
>>
>> -Kirk
>>
>> On Wed, Apr 14, 2021 at 12:55 PM Kirk Lund  wrote:
>>
>> > Our coding standard and our Design Decisions does NOT require using
>> final
>> > on parameters and local variables.
>> >
>> > Please do NOT request that I add or keep final on parameters or
>> local
>> > variables unless the commun

Re: [VOTE] Requiring final keyword on every parameter and local variable

2021-04-22 Thread Kirk Lund
Voting and discussion are closed for now. I'll count up the votes and post
a summary soon.

-Kirk

On Thu, Apr 22, 2021 at 7:54 AM Bruce Schuchardt  wrote:

> Kirk speaks my mind.  -1 on this requirement
>
> On 4/21/21, 11:29 AM, "Kirk Lund"  wrote:
>
> -1 as I've already explained in my DISCUSS thread for this topic, I
> don't
> think final gives sufficient benefit for local variables or method
> parameters because it only prevents reassigning object pointers and
> primitive values.
>
> I favor individuals making a personal decision on whether or not they
> want
> to use 'final' in this way without requiring others to do the same
> (and NOT
> requiring this use of 'final' in PR reviews, although "suggesting" is
> ok).
>
> Note that JDK and all other libraries I looked through do not use
> final in
> this way.
>
> *I'm willing to change my vote to -0 if*:
>
>1. we qualify the requirement to only apply to method and
> constructor
>parameters (not local vars)
>2. we find a way to introduce automation for enforcement (such as
> we do
>with RAT for license headers)
>
> *I'm willing to change my vote to +1 if *we decide to require 'final'
> parameters (and even local vars) as part of a bigger shift towards
> improving code quality by embracing Clean Code (ala Bob Martin):
>
>1. we commit to developing a robust and thorough coding standard
> that
>includes requiring true unit tests and passing in dependencies via
> class
>constructors or methods
>2. we revisit our DOD and enforce it in PR reviews
>
> I recommend that contributors use IntelliJ inspections (or comparable
> guideline for anyone who doesn't want to use IntelliJ) that will
> assist us
> in avoiding and fixing long methods when writing new methods and when
> editing/refactoring existing methods (see below)
> Method metrics (see IntelliJ > Preferences > Editor > Inspections):
>
>- Method with too many parameters
>- Overly complex method
>- Overly long method
>- Overly nested method
>
> The inspections can be customized for Geode as long as the values
> aren't
> increased to the point of it being useless. For example, we would have
> to
> agree as a community what the "maximum number of parameters" is for the
> "Method with too many parameters" inspection.
>
> We could even export an inspections profile to be stored in the Geode
> repo
> and used by contributors.
>
> -Kirk
>
> On Wed, Apr 14, 2021 at 12:55 PM Kirk Lund  wrote:
>
> > Our coding standard and our Design Decisions does NOT require using
> final
> > on parameters and local variables.
> >
> > Please do NOT request that I add or keep final on parameters or local
> > variables unless the community votes and decides that every
> parameter and
> > local variable should require the final keyword. I coded this way in
> the
> > past and I found that it resulted in noisy code with no benefit. We
> can
> > argue about using this keyword all you want but the fact is I'm
> against it,
> > and I will not embrace it unless this community decides that we need
> to use
> > it.
> >
> > Using final on instance or class fields does have a concurrency
> benefit
> > and I support that only.
> >
> > If you want to add final to every single parameter and local var in
> the
> > Geode codebase, then now is your chance to vote on it. Please vote.
> >
> > Thanks,
> > Kirk
> >
> >
>
>


Re: [VOTE] Requiring final keyword on every parameter and local variable

2021-04-21 Thread Kirk Lund
I'm interested in https://projectlombok.org and I remember John Blum using
it on some projects. I would like to study it and do some perf testing on
it before supporting it for Geode though. In general, I don't like
generated code and it looks like at least some of the features involve
generated code -- that's the only potential downside for me.

-Kirk

On Mon, Apr 19, 2021 at 12:57 PM Owen Nichols  wrote:

> Any interest in adopting https://projectlombok.org in Geode?  Lombok
> allow a more readable "val" vs "var" syntax instead of "final" vs "" which
> could make it easier to do the right thing without "increasing visual
> noise".
>
> On 4/19/21, 12:40 PM, "Dale Emery"  wrote:
>
> My test for whether enforce a guideline in a PR review is: Would I be
> willing to automate enforcing it in CI?
>
> I am -0 on this particular guideline.
>
> IntelliJ offers two competing inspections for Java coding style:
>
>   *   Local variable or parameter can be final
>   *   Unnecessary 'final' on local variable or parameter
>
> Both of these are (I think) disabled by default in IntelliJ. And
> neither satisfies my “I’d be willing to automate enforcing it” test.
>
> Dale
>
> From: Mark Hanson 
> Date: Monday, April 19, 2021 at 11:08 AM
> To: dev@geode.apache.org 
> Subject: Re: [VOTE] Requiring final keyword on every parameter and
> local variable
> Hi Jake,
>
> I agree with everyone's point about final being a useful, I just don't
> find it useful enough to do anything manually across the code base at this
> point.
>
> I believe first in foremost in no code warnings. Intellij warns about
> variables that can be final, so I make them final as it finds them. It is
> very rare that I am writing new code at this point. I have spent the last
> year bug fixing other people's code.  From my standpoint, original intent
> is largely moot. So, for me, the question is do I go through code that is
> not mine and mark it all as final (where appropriate)? Sure, in code that I
> touch. In code the rest of the codebase, I think there are bigger fish to
> fry before we get to final.
>
> It seems like the larger portion of the consensus is to recommend that
> variables are marked final (where appropriate) as we find them or create
> them.
> That seems like the going forward approach consensus.
>
> I will be blunt though. This seems nitpicky *compared* to the number
> of code warnings there are and the fact that people are not actively fixing
> all of the warnings.
> If we don't come to the same consensus about all warnings this seems
> like painting a rusted car, sure it makes it all look the same, but does
> very little for the underlying problems. Now to undermine my own argument a
> little, I think that especially in release blockers, we want to touch as
> little code as possible, so I am flexible there.
>
> I would also like to agree with Udo about final really not being very
> good compared to Mutable and Immutable objects in other languages.
>
> Thanks,
> Mark
>
>
> On 4/15/21, 7:49 PM, "Udo Kohlmeyer"  wrote:
>
> @jake, you are correct, I did miss the LOCAL variable out of the
> subject. :face_palm:
>
> Yes, adding "final" to LOCAL variables will be HUGELY beneficial
> in this regard. Have we seen any performance improvement after having
> refactored this method? Have you been able to measure any improvements?
>
> --Udo
>
> On 4/15/21, 1:19 PM, "Jacob Barrett"  wrote:
>
>
>
> > On Apr 14, 2021, at 7:46 PM, Udo Kohlmeyer 
> wrote:
> > @Jake the idea of smaller methods is great and we should
> ALWAYS strive for that. But that argument is completely irrelevant in this
> discussion. As making method arguments final does not naturally guide a
> developer to creating smaller methods. Nor does a smaller method mean it
> can/will be jitted. Too many factors (to discuss here) are part of that
> decision, also it is not relevant in this discussion. But more on that
> topic read THIS.
>
> The original subject is in regards to parameters and local
> variables.
>
> Irrelevant is certainly an opinion you are welcome to have but
> let me challenge you. Goto DistributedCacheOperation._distribute().
>
> First challenge, look at around line 333:
> boolean reliableOp = isOperationReliable() &&
> region.requiresReliabilityCheck();
>
> Without scrolling do you see that variable used? Nope, because
> it is first used on line 439, ~100 lines away. Does it mutate between
> there, well I can search for all uses and find out or I could be nice to
> the next person and intend for it to never mutate by adding final. Intent
> communicated!
>
> Second challenge, mark all the local variables in the method
> as final. Now make it compile without introducing more mutable variables.
> At the end of this journey you will have about a dozen unit 

Re: [VOTE] Requiring final keyword on every parameter and local variable

2021-04-21 Thread Kirk Lund
-1 as I've already explained in my DISCUSS thread for this topic, I don't
think final gives sufficient benefit for local variables or method
parameters because it only prevents reassigning object pointers and
primitive values.

I favor individuals making a personal decision on whether or not they want
to use 'final' in this way without requiring others to do the same (and NOT
requiring this use of 'final' in PR reviews, although "suggesting" is ok).

Note that JDK and all other libraries I looked through do not use final in
this way.

*I'm willing to change my vote to -0 if*:

   1. we qualify the requirement to only apply to method and constructor
   parameters (not local vars)
   2. we find a way to introduce automation for enforcement (such as we do
   with RAT for license headers)

*I'm willing to change my vote to +1 if *we decide to require 'final'
parameters (and even local vars) as part of a bigger shift towards
improving code quality by embracing Clean Code (ala Bob Martin):

   1. we commit to developing a robust and thorough coding standard that
   includes requiring true unit tests and passing in dependencies via class
   constructors or methods
   2. we revisit our DOD and enforce it in PR reviews

I recommend that contributors use IntelliJ inspections (or comparable
guideline for anyone who doesn't want to use IntelliJ) that will assist us
in avoiding and fixing long methods when writing new methods and when
editing/refactoring existing methods (see below)
Method metrics (see IntelliJ > Preferences > Editor > Inspections):

   - Method with too many parameters
   - Overly complex method
   - Overly long method
   - Overly nested method

The inspections can be customized for Geode as long as the values aren't
increased to the point of it being useless. For example, we would have to
agree as a community what the "maximum number of parameters" is for the
"Method with too many parameters" inspection.

We could even export an inspections profile to be stored in the Geode repo
and used by contributors.

-Kirk

On Wed, Apr 14, 2021 at 12:55 PM Kirk Lund  wrote:

> Our coding standard and our Design Decisions does NOT require using final
> on parameters and local variables.
>
> Please do NOT request that I add or keep final on parameters or local
> variables unless the community votes and decides that every parameter and
> local variable should require the final keyword. I coded this way in the
> past and I found that it resulted in noisy code with no benefit. We can
> argue about using this keyword all you want but the fact is I'm against it,
> and I will not embrace it unless this community decides that we need to use
> it.
>
> Using final on instance or class fields does have a concurrency benefit
> and I support that only.
>
> If you want to add final to every single parameter and local var in the
> Geode codebase, then now is your chance to vote on it. Please vote.
>
> Thanks,
> Kirk
>
>


Re: [VOTE] Requiring final keyword on every parameter and local variable

2021-04-19 Thread Kirk Lund
I'm planning to leave this VOTE and DISCUSSION open until Wednesday 2400 PM
UTC.

On Wed, Apr 14, 2021 at 12:55 PM Kirk Lund  wrote:

> Our coding standard and our Design Decisions does NOT require using final
> on parameters and local variables.
>
> Please do NOT request that I add or keep final on parameters or local
> variables unless the community votes and decides that every parameter and
> local variable should require the final keyword. I coded this way in the
> past and I found that it resulted in noisy code with no benefit. We can
> argue about using this keyword all you want but the fact is I'm against it,
> and I will not embrace it unless this community decides that we need to use
> it.
>
> Using final on instance or class fields does have a concurrency benefit
> and I support that only.
>
> If you want to add final to every single parameter and local var in the
> Geode codebase, then now is your chance to vote on it. Please vote.
>
> Thanks,
> Kirk
>
>


Re: [DISCUSS] Requiring final keyword on every parameter and local variable

2021-04-15 Thread Kirk Lund
PS: Here is my version of that constructor after refactoring the class:

PartitionedRegionClearMessage(Set recipients,
PartitionedRegion region,
ReplyProcessor21 processor, PartitionedRegionClearMessage.OperationType
operationType,
RegionEventImpl event) {
  super(recipients, region.getPRId(), processor);
  partitionedRegion = region;
  this.operationType = operationType;
  callbackArgument = event.getRawCallbackArgument();
  eventId = event.getEventId();
}

And I could rename "region" to "partitionedRegion", etc.

On Thu, Apr 15, 2021 at 12:00 PM Kirk Lund  wrote:

> Here's a quote about code reviews in Clean Code in Python:
>
> We should invest time in code reviews, thinking about what is good code,
>> and how readable and understandable it is. When looking at the code written
>> by a peer, you should ask such questions as:
>> * Is this code easy to understand and follow to a fellow programmer?
>> * Does it speak in terms of the domain of the problem?
>> * Would a new person joining the team be able to understand it, and work
>> with it effectively?
>
>
> And I would add in:
> * Does the code follow community adopted coding standards?
> * Is the code complete from a DOD (Definition Of Done) perspective?
>
> Asking the contributor to add back in a "final" keyword on one parameter
> of a refactored method or constructor that is now small and easy to
> understand at a glance is a useful review in my opinion.
>
> If using "final" keyword on all method/constructor parameters and/or local
> variables becomes part of our coding standard, then it's a valid review
> because it's required and missing.
>
> If using "final" is NOT required and I remove "final" from the regionEvent
> parameter of the following constructor of a new class...
>
> PartitionedRegionClearMessage(Set recipients,
> PartitionedRegion region,
> ReplyProcessor21 processor,
> PartitionedRegionClearMessage.OperationType operationType,
> final RegionEventImpl event) {
>   super(recipients, region.getPRId(), processor);
>   partitionedRegion = region;
>   op = operationType;
>   cbArg = event.getRawCallbackArgument();
>   eventID = event.getEventId();
> }
>
> ...then I'm saying it's not helpful and not desirable for a review to tell
> me to add that "final" keyword back in with a "Request Changes" on the PR.
> I also do NOT want add "final" keyword to all of the other parameters for
> consistency because it just adds a bunch of words without adding any value.
> This kind of review offends me, because I am striving for high quality
> clean code with short methods and constructors for the highest readability
> and maintainability that I can achieve.
>
> -Kirk
>
>>


Re: [DISCUSS] Requiring final keyword on every parameter and local variable

2021-04-15 Thread Kirk Lund
Here's a quote about code reviews in Clean Code in Python:

We should invest time in code reviews, thinking about what is good code,
> and how readable and understandable it is. When looking at the code written
> by a peer, you should ask such questions as:
> * Is this code easy to understand and follow to a fellow programmer?
> * Does it speak in terms of the domain of the problem?
> * Would a new person joining the team be able to understand it, and work
> with it effectively?


And I would add in:
* Does the code follow community adopted coding standards?
* Is the code complete from a DOD (Definition Of Done) perspective?

Asking the contributor to add back in a "final" keyword on one parameter of
a refactored method or constructor that is now small and easy to understand
at a glance is a useful review in my opinion.

If using "final" keyword on all method/constructor parameters and/or local
variables becomes part of our coding standard, then it's a valid review
because it's required and missing.

If using "final" is NOT required and I remove "final" from the regionEvent
parameter of the following constructor of a new class...

PartitionedRegionClearMessage(Set recipients,
PartitionedRegion region,
ReplyProcessor21 processor, PartitionedRegionClearMessage.OperationType
operationType,
final RegionEventImpl event) {
  super(recipients, region.getPRId(), processor);
  partitionedRegion = region;
  op = operationType;
  cbArg = event.getRawCallbackArgument();
  eventID = event.getEventId();
}

...then I'm saying it's not helpful and not desirable for a review to tell
me to add that "final" keyword back in with a "Request Changes" on the PR.
I also do NOT want add "final" keyword to all of the other parameters for
consistency because it just adds a bunch of words without adding any value.
This kind of review offends me, because I am striving for high quality
clean code with short methods and constructors for the highest readability
and maintainability that I can achieve.

-Kirk

>


[DISCUSS] Requiring final keyword on every parameter and local variable

2021-04-15 Thread Kirk Lund
Moving the discussion to this thread. Sorry for starting off with a VOTE.

I'm primarily interested in Maintainability
https://en.wikipedia.org/wiki/Software_quality#Maintainability, and my
concern regarding using "final" on method parameters or local variables is
about Code Readability: understanding what the code does at a glance and
being able to spot bugs and mistakes more easily.

Infrequently used keywords such as "final" on method parameters and local
variables cause "noise" for me such that it slows down my understanding of
what the code does. I find that I have to spend more time studying it and
it takes longer than a "glance."

Now, if the keyword buys us enough to justify the "noise" then it's worth
it (such as for long methods), but I am arguing that small methods of a
dozen or less lines do NOT benefit from having "final" on any of its
variables or parameters.

I also really, really, really want to encourage this community to refactor
the code. In other words, if you touch a long method, PLEASE refactor it to
break it up into smaller, well-named methods with descriptive parameters
and local variables that are only ever assigned once. Then each method is
very easy to understand at a glance and we no longer need "final" on any of
its vars or params to indicate that it is never reassigned... because we
know at a glance that it's only assigned once.

Developers also have a tendency to look at code and then emulate it. If
they're looking at phenomenally long methods that aren't unit tested and
have short non-descriptive var names, then they're going to repeat that
both when maintaining such code and when writing new code. We don't want
that, or at least I don't want that.

Consistency (such as always using "final" on all method parameters) results
in the developer getting used to it and expecting it so that you recover
some of that speed in understanding what you're studying without stumbling
over "noise" or unnecessary syntax and structures.

I believe that simplicity and understandability is more important than
having the "final" keyword on just one out of five parameters on one
method. As I refactor the code, I make sure that the methods end up short
and succinct so that "final" is no longer needed on that one parameter.

Introducing a coding standard would help us immensely. The standard can
include important things like:
* Use descriptive variable and method names that are spelled out.
* Keep the methods short.

There are tons of best practices, corrections for code smells, and
conventions for highly readable and maintainable code that we could and
should be adopting.

The bottom line is I don't really care about "final" -- using it or NOT
using it -- what I care about is code quality and making the code readable
and maintainable while following a well-thought out and group-written
coding standard.

-Kirk


Re: [VOTE] Requiring final keyword on every parameter and local variable

2021-04-15 Thread Kirk Lund
Sorry Bruce, that's my lack of Apache finesse. I know you might expect me
to remember it but I don't start discussions or votes often enough to
remember. I'll be sure to read up on this process before starting any
threads in the future.

On Thu, Apr 15, 2021 at 7:32 AM Bruce Schuchardt  wrote:

> I agree with Udo.  Also, this shouldn't be a VOTE without discussion and I
> don't see a DISCUSS thread.
>
> On 4/14/21, 7:46 PM, "Udo Kohlmeyer"  wrote:
>
> "+1" to ENCOURAGING developers to make "final" a requirement for
> method arguments.
> "-1" to making it a hard rule.
>
> If we want to enforce this rule on the basis of readability or
> performance, I fear that we might be beating the wrong horse here for the
> wrong reasons!!
>
> Short reasoning:
>
> If we really want to effect real lasting change, is to adopt
> principles that Rust (object owners) and Kotlin (immutable collections)
> make core to the language. Immutable POJOs and possibly an extension for
> collections to make intent of immutability clear would be far more
> effective. As code reviewers would be able to look at a method definition
> and instantly know what arguments are mutable.
>
> Adding "final" on fields, methods and classes has a far greater effect
> than that of method arguments. If we intent to enforce it based on
> readability, then we should really consider applying different approaches
> (as stated in the paragraph above) that can be used with far greater
> effectiveness.
>
> Longer reasoning:
>
> I think all arguments relating to "indicating authors intent" or
> "variable  does not change" are great, but in reality we as developers
> should always assume that this was the intent. I think if we really want to
> improve our code base, let's opt for IMMUTABLE data objects. The "final"
> keyword has no benefit if you pass around POJOs (including Collections). As
> long as you don't change the instance reference, the "final" keyword has
> absolutely no effect. It does not mean that you cannot change every field
> internal of the POJO. The "final" keyword is only effective against the
> re-assignment of that variable/argument within a method. But a better
> practice is to avoid the re-assignment of any variable/argument in the
> first place.
>
> Kotlin has a great language (one of many) that method/function
> arguments are naturally final and this behavior cannot be changed. In
> addition there is a the feature that the collections are immutable and as a
> developer you have to be explicit if you want a collection to be mutable,
> by defining the collection as a "Mutable*" i.e MutableMap, MutableSet,
> MutableList
>
> @Jake the idea of smaller methods is great and we should ALWAYS strive
> for that. But that argument is completely irrelevant in this discussion. As
> making method arguments final does not naturally guide a developer to
> creating smaller methods. Nor does a smaller method mean it can/will be
> jitted. Too many factors (to discuss here) are part of that decision, also
> it is not relevant in this discussion. But more on that topic read THIS.
>
> --Udo
>
> On 4/15/21, 9:29 AM, "Jacob Barrett"  wrote:
>
> If a method is longer than a handful of lines and I go in to
> refactor it I am going to start by making every variable I find final. Then
> I am going to figure out how to keep them final. By doing so you naturally
> produce smaller functional methods that are usually independently unit
> testable. Smaller methods can get jitted. Smaller methods can take less
> time to reach a safe point (see time to safe point issues in Java).
>
> For this reason I strongly prefer on final everywhere and if you
> think you need a variable to not be final you should use that as an
> indicator to refractor your code.
>
> So, +1 for me for final everywhere.
>
> Also, please stop declaring variables miles away from their use.
>
> > On Apr 14, 2021, at 12:55 PM, Kirk Lund 
> wrote:
> >
> > Our coding standard and our Design Decisions does NOT require
> using final
> > on parameters and local variables.
> >
> > Please do NOT request that I add or keep final on parameters or
> local
> > variables unless the community votes and decides that every
> parameter and
> > local variable should require the final keyword. I coded this
> way in the
> > past and I found that it resulted in noisy code with no benefit.
> We can
> > argue about using this keyword all you want b

[VOTE] Requiring final keyword on every parameter and local variable

2021-04-14 Thread Kirk Lund
Our coding standard and our Design Decisions does NOT require using final
on parameters and local variables.

Please do NOT request that I add or keep final on parameters or local
variables unless the community votes and decides that every parameter and
local variable should require the final keyword. I coded this way in the
past and I found that it resulted in noisy code with no benefit. We can
argue about using this keyword all you want but the fact is I'm against it,
and I will not embrace it unless this community decides that we need to use
it.

Using final on instance or class fields does have a concurrency benefit and
I support that only.

If you want to add final to every single parameter and local var in the
Geode codebase, then now is your chance to vote on it. Please vote.

Thanks,
Kirk


Precheckin job failed to fork JVM during compile

2021-03-25 Thread Kirk Lund
https://concourse.apachegeode-ci.info/builds/17742 core dumped during
compilation of one of the modules.

I believe exit code 134 means that it failed to fork a new Java VM.

I wonder what would cause a test job to be unable to fork a process. It
seems to have forked processes for compiling all of the other modules, so
it's not a persistent problem.

> Task :geode-membership:compileJava
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (methodData.cpp:1252), pid=8491, tid=0x7f1716a78700
#  fatal error: unexpected tag 32
#
# JRE version: OpenJDK Runtime Environment (8.0_282-b08) (build
1.8.0_282-b08)
# Java VM: OpenJDK 64-Bit Server VM (25.282-b08 mixed mode linux-amd64
compressed oops)
# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/geode/geode/geode-membership/hs_err_pid8491.log

<<>>

> Task :geode-membership:compileJava
#
# Compiler replay data is saved as:
# /home/geode/geode/geode-membership/replay_pid8491.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp

<<>>

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':geode-membership:compileJava'.
> Compilation failed with exit code 134; see the compiler error output for
details.


Last 3 unit tests using PowerMock

2021-03-22 Thread Kirk Lund
We are down to 3 unit tests that use PowerMock. I'm going to start work on
these 3 today.

geode-core/src/test/java/org/apache/geode/internal/offheap/OffHeapRegionEntryHelperJUnitTest.java
geode-core/src/test/java/org/apache/geode/internal/offheap/ReferenceCountHelperImplTest.java
geode-core/src/test/java/org/apache/geode/internal/offheap/ReferenceCountHelperJUnitTest.java

Please let me know if anyone has already started on any of these. If I
don't hear back, I'll try to have a PR ready within a couple days.

Thanks,
Kirk


Re: org.jetbrains.annotations does not exist

2021-03-17 Thread Kirk Lund
File | Invalidate Caches / Restart... fixed it. I haven’t had to invalidate
caches for over a year and forgot all about it.

Thanks,
Kirk

On Wed, Mar 17, 2021 at 4:01 PM Kirk Lund  wrote:

> I'm trying to build in IntelliJ (funny enough) and getting these failures:
>
>
> /Users/klund/dev/geode1/geode-core/src/main/java/org/apache/geode/internal/cache/tx/RemoteOperationMessage.java
> Error:(25, 33) java: package org.jetbrains.annotations does not exist
> Error:(299, 4) java: cannot find symbol
>   symbol:   class NotNull
>   location: class org.apache.geode.internal.cache.tx.RemoteOperationMessage
>
> /Users/klund/dev/geode1/geode-core/src/main/java/org/apache/geode/internal/cache/tier/sockets/SocketMessageWriter.java
> Error:(25, 33) java: package org.jetbrains.annotations does not exist
> Error:(39, 8) java: cannot find symbol
>   symbol:   class Nullable
>   location: class
> org.apache.geode.internal.cache.tier.sockets.SocketMessageWriter
>
> Googling just has results saying that we need to add jetbrains annotations
> to our build.
>
> I can fully build on the command line, but IntelliJ refuses to build even
> after refreshing the Gradle plugin and doing a clean rebuild.
>
> Any ideas how to get my IntelliJ to build Geode with these jetbrains
> annotations?
>
> Thanks,
> Kirk
>
>


org.jetbrains.annotations does not exist

2021-03-17 Thread Kirk Lund
I'm trying to build in IntelliJ (funny enough) and getting these failures:

/Users/klund/dev/geode1/geode-core/src/main/java/org/apache/geode/internal/cache/tx/RemoteOperationMessage.java
Error:(25, 33) java: package org.jetbrains.annotations does not exist
Error:(299, 4) java: cannot find symbol
  symbol:   class NotNull
  location: class org.apache.geode.internal.cache.tx.RemoteOperationMessage
/Users/klund/dev/geode1/geode-core/src/main/java/org/apache/geode/internal/cache/tier/sockets/SocketMessageWriter.java
Error:(25, 33) java: package org.jetbrains.annotations does not exist
Error:(39, 8) java: cannot find symbol
  symbol:   class Nullable
  location: class
org.apache.geode.internal.cache.tier.sockets.SocketMessageWriter

Googling just has results saying that we need to add jetbrains annotations
to our build.

I can fully build on the command line, but IntelliJ refuses to build even
after refreshing the Gradle plugin and doing a clean rebuild.

Any ideas how to get my IntelliJ to build Geode with these jetbrains
annotations?

Thanks,
Kirk


Avoid PowerMockito

2021-03-09 Thread Kirk Lund
I just reviewed a PR that was adding a unit test using PowerMockito. We've
had lots of problems with PowerMockito leaving the unit testing JVM
corrupted for later tests. Using PowerMockito also discourages us from
refactoring product code to have better design and be easier to unit test.
So in previous threads here on dev-list, the community decided to axe our
usage of PowerMockito.

There are lots of Jira tickets about PowerMockito in Geode:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20GEODE%20AND%20text%20~%20%22PowerMockito%22

There is one open ticket for removing PowerMockito from Geode:
https://issues.apache.org/jira/browse/GEODE-6143

Unfortunately the assignee is no longer active in this community so we need
someone or everyone to pitch in. If you find yourself working on an area of
code that has a unit test using PowerMockito, please rewrite the test using
regular Mockito and refactor the code that it tests so that you can pass
all of its dependencies in via the constructor(s).

If anyone would like to volunteer to take on GEODE-6143, please feel free
to reassign it. I would be happy to help you out.

Thanks,
Kirk


Re: [Proposal] Backport GEODE-8886 to 1.14

2021-03-08 Thread Kirk Lund
+1

On Fri, Mar 5, 2021 at 3:58 PM Owen Nichols  wrote:

> Added to blocker board, thanks
>
> On 3/5/21, 3:15 PM, "Jianxia Chen"  wrote:
>
> +1
>
> On Fri, Mar 5, 2021 at 2:48 PM Mark Hanson  wrote:
>
> > Hi All,
> >
> > It was discovered that there was a bug with unprocessed events under
> > certain conditions with WAN specifically when old and new servers
> are vying
> > for primary status.
> > This fix will alleviate the problem in 1.14. The code that caused the
> > problem is present in 1.14, hence the backport request.
> >
> > Thanks,
> > Mark
> >
>
>


Re: [Proposal] Backport GEODE-8958 into 1.14.x, 1.13.x, 1.12.x branches

2021-03-03 Thread Kirk Lund
+1

On Wed, Mar 3, 2021 at 9:51 AM Jianxia Chen  wrote:

> +1
>
> On Wed, Mar 3, 2021 at 9:44 AM Darrel Schneider  wrote:
>
> > +1
> > 
> > From: Xiaojian Zhou 
> > Sent: Wednesday, March 3, 2021 9:43 AM
> > To: dev@geode.apache.org 
> > Subject: Re: [Proposal] Backport GEODE-8958 into 1.14.x, 1.13.x, 1.12.x
> > branches
> >
> > +1
> >
> > On 3/1/21, 11:30 PM, "Owen Nichols"  wrote:
> >
> > That sounds a lot better than never expiring them if that does
> happen,
> > I think this would be good to include.
> >
> > On 3/1/21, 2:41 PM, "Mark Hanson"  wrote:
> >
> > I would like to backport GEODE-8958 into previous release
> branches
> > to alleviate a problem with tombstones if timestamps become corrupt for
> > some reason.
> >
> > Thanks,
> > Mark
> >
> >
> >
>


Re: [PROPOSAL] backport fix for GEODE-8920 to 1.14.0

2021-03-03 Thread Kirk Lund
+1 to back-port GEODE-8920 fix

On Mon, Mar 1, 2021 at 1:23 PM Owen Nichols  wrote:

> Sounds reasonable for same reasons as GEODE-8919.  I've added it to 1.14.0
> blocker board.
>
> On 3/1/21, 1:20 PM, "Kamilla Aslami"  wrote:
>
> Hi all,
>
> I would like to back-port the fix for GEODE-8920 to 1.14.0. It
> modified debug logging in DirectChannel to make it easier to trace a
> message. This change is low-risk and important for debugging.
>
> Thanks,
> Kamilla
>
>


Examples of good Mocking in tests

2021-02-19 Thread Kirk Lund
There are quite a few examples of good mocking in both unit tests and
integration tests, with the majority being in unit tests. An easy way to
find many of these tests in Geode is to grep for uses of MockitoRule with
STRICT_STUBS.

Also, remember that if the test touches disk or network then it's an
integration test and should be in src/integrationTest instead of src/test
in the module.

We really should move away from the old Fakes class that is in geode-junit.
It's hard-wired to fake everything in Geode (Cache and DistributedSystem,
top-to-bottom) and mocks implementation classes instead of pushing us
towards using and mocking interfaces. If you find yourself modifying a unit
test that uses Fakes, please consider updating that test to not use it.

A good unit test should just directly use our preferred testing libraries
of Mockito and AssertJ, be self-contained other than using simple JUnit
Rules. Unit tests should perform validation via the APIs it exposes to
other classes where possible or via package-private methods annotated
with @VisibleForTesting if needed. The latter helps quite a bit to get
existing legacy code under test.

Cheers,
Kirk


Fixed flaky dunit test AsyncInvocationTimeoutDistributedTest

2020-12-05 Thread Kirk Lund
I just merged #5799 to develop. This
fixes AsyncInvocationTimeoutDistributedTest which is one of the top 3
flakiest dunit tests.

Mass test results: 21% flaky overall, top three offenders:
1) GEODE-8573: SerialGatewaySenderOperationsDistributedTest (8%)
2) GEODE-7710: JMXMBeanReconnectDUnitTest (4%)
3) GEODE-8634: AsyncInvocationTimeoutDistributedTest (3%)

Cheers,
Kirk


Committing broken javadocs to develop

2020-11-20 Thread Kirk Lund
Please remember to:
* fix all broken javadocs before submitting a PR
* always review javadocs when reviewing PRs

Thanks,
Kirk


Re: PR process and etiquette

2020-10-28 Thread Kirk Lund
How about we make this a recommendation rather than a rule?

I'd like to also recommend that contributors consider prefixing the PR
title with "DRAFT: " while it is in draft. This just makes it easier to see
at a glance that it's a draft. When I change the PR to "ready for review" I
edit the title to remove "DRAFT: ".

On Wed, Oct 28, 2020 at 9:59 AM Bruce Schuchardt  wrote:

> Hi Owen - I wasn't aware that non-committers can't add reviewers to their
> PRs but I don't see how using DRAFT mode helps with that.  The idea that I
> can't request a review until the commit checks all pass seems absurd to me.
>
> On 10/28/20, 9:15 AM, "Owen Nichols"  wrote:
>
> Hey Bruce, please consider that non-committers are not permitted to
> add reviewers themselves, so a consistent convention to indicate when a PR
> has moved from work-in-progress to ready-for-review will help alert the
> community when to assign reviewers.
>
> Currently, I see countless creative variations on people adding "WIP"
> or "DO NOT REVIEW" or other text somewhere in the summary or description.
> I think the suggestion here is to standardize on using Draft status as the
> canonical way to communicate this.
>
> Also worth noting, you can change a PR back to 'draft' mode at any
> time (so whether you forgot initially, or a test failure or review feedback
> made you realize more work is needed, you can always amend the draft status
> to communicate whether you're still working, or ready for review again).
>
> I'm not sure why you think this is going to make it more difficult to
> attract and retain new contributors.  The point is to make it easier for
> new contributors to get their PRs reviewed, and give reviewers more
> confident that their time is appreciated.
>
> I see your point that as a committer, you have no need to communicate
> this status, since you can simply add reviewers yourself when you're
> ready.  Asking everyone to use draft status is not the ideal workaround for
> the GitHub flaw that non-committers can't request reviewers, but it also
> takes only 2 seconds of your time.  Expecting new contributors to email the
> dev list every time they want to request a review of their PR seems far
> more problematic in term of attracting and retaining new contributors.
>
> On 10/28/20, 8:10 AM, "Bruce Schuchardt"  wrote:
>
> -1
> While I often use the Draft option I don't see why we want to add
> even more rules about how we use github.  I think it's enough to put in a
> PR and then add reviewers when you're ready for comments.  Getting the
> stink-eye for putting up a non-Draft PR is just going to make it more
> difficult to attract and retain new contributors.
>
> On 10/27/20, 5:41 PM, "Udo Kohlmeyer"  wrote:
>
> Dear Apache Geode Devs,
> It is really great going through all the PRs that been
> submitted. As Josh Long is known to say: "I work for PRs".
> Whilst going through some of the PRs I do see that there are
> many PRs that have multiple commits against the PR.
> I know that the PR submission framework kicks off more testing
> than we do on our own local machines. It is extremely uncommon to submit a
> PR the first time and have all tests go green. Which h means we invariably
> iterate over commits to make the build go green.
> In this limbo time period, it is hard for any reviewer to know
> when the ticket is ready to be reviewed.
> I want to propose that when submitting a PR, it is initially
> submitted as a DRAFT PR. This way, all test can still be run and work can
> be done to make sure "green" is achieved. Once "green" status has been
> achieved, the draft can then be upgraded to a final PR by pressing the
> "Ready For Review" button. At this point all commits on the branch can then
> once again be squashed into a single commit.
> Now project committers will now know that the PR is in a state
> that it can be reviewed for completeness and functionality.
> In addition, it will help tremendously helpful if anyone
> submitting a PR monitors their PR for activity. If there is no activity for
> a few days, please feel free to ping the Apache Geode Dev community for
> review. If review is request, please prioritize some time to address the
> feedback, as reviewers spend time reviewing code and getting an
> understanding what the code is doing. If too much time goes by, between
> review and addressing the review comments, not only does the reviewer lose
> context, possibly requiring them to spend time again to understand what the
> code was/is supposed to do, but also possibly lose interest, as the ticket
> has now become cold or dropped down the list of PRs.
> There are currently many PRs that are in a cold state, where
> the time between review and response has been so long, that both parties
> (reviewer and submitter) have forgotten about the PR.
> In the case that 

Full diffs in Jira comments

2020-10-28 Thread Kirk Lund
Anyone else notice that we're getting full diffs as comments to Jira
tickets again?


Please review PR #5667

2020-10-27 Thread Kirk Lund
GEODE-8647: Support multiple instances of DistributedMap #5667
https://github.com/apache/geode/pull/5667

Thanks,
Kirk


Re: [PROPOSAL] Remove "Fix Version/s" and "Sprint" from Jira "Create Issue" dialogue and include "Affects Version/s"

2020-08-17 Thread Kirk Lund
+1 if it's possible

On Mon, Aug 17, 2020 at 12:04 PM Donal Evans  wrote:

> Looking at the dialogue that opens when you attempt to create a new ticket
> in the GEODE Jira[1], there are two fields included that aren't really
> necessary and may cause confusion. The "Fix Version/s" field should
> presumably not be filled out until the issue has actually been fixed,
> rather than at the time of ticket creation. The "Sprint" field seems to no
> longer serve any purpose at all that I can discern, having only been filled
> in 13 tickets, the most recent of which was created in December 2018[2].
> With the expansion of the community contributing to the Geode project, it's
> important to provide a straightforward experience for people who are new to
> the project and wish to file tickets, so the presence of these fields may
> cause issues.
>
> I propose that these two fields be removed from the "Create Issue"
> dialogue and that the "Affects Version/s" field be added, since that field
> is far more important at time of ticket creation. There are currently 3851
> bug tickets in the Jira with no "Affects Version/s" value entered at
> all[3], which I suspect is in part due to that field not being an option in
> the "Create Issue" dialogue, meaning you have to remember to go back after
> creating the ticket and enter it. With Geode moving to a model of having
> support branches and patch releases, properly capturing the versions
> affected by a given issue becomes even more important.
>
> [1] https://i.imgur.com/oQ8CW87.png
> [2]
> https://issues.apache.org/jira/projects/GEODE/issues/GEODE-8433?filter=allissues=cf%5B12310921%5D+ASC%2C+created+DESC
> [3]
> https://issues.apache.org/jira/browse/GEODE-8433?jql=project%20%3D%20GEODE%20AND%20issuetype%20%3D%20Bug%20AND%20affectedVersion%20%3D%20EMPTY%20ORDER%20BY%20created%20DESC%2C%20affectedVersion%20ASC%2C%20cf%5B12310921%5D%20ASC
>


Using IgnoredException in Distributed Tests

2020-08-14 Thread Kirk Lund
Hi Geode devs,

I published a short article for Geode developers describing how to use
IgnoredException in Distributed Tests (DUnit Tests):

https://kirklund.github.io/using-ignoredexception-in-distributed-tests/

Feel free to request topics. I'll try to publish more developer articles on
whatever topics are most popular.

Thanks,
Kirk


Re: TimeIntegrationTest is flaky

2020-08-13 Thread Kirk Lund
Thanks guys!

On Thu, Aug 13, 2020 at 10:47 AM Raymond Ingles  wrote:

> Yes, we're on it. Thanks!
>
> On 8/12/20, 7:25 PM, "Kirk Lund"  wrote:
>
> Since this is a new test, can we please prioritize fixing it?
>
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8379data=02%7C01%7Cringles%40vmware.com%7C9c67a42631c6498cba3f08d83f16f959%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637328715199279184sdata=B2F0qmBfVZ8Yfzo3U3J5inQW9b7eRu0EO44gSVFtKfU%3Dreserved=0
>
> java.lang.AssertionError:
> Expecting:
>  <0L>
> to be greater than:
>  <0L>
> at
>
> org.apache.geode.redis.internal.executor.server.TimeIntegrationTest.timeCommandRespondsWIthTwoValues(TimeIntegrationTest.java:57)
> at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
>
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:566)
> at
>
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at
>
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
>
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at
>
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at
>
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at
>
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at
>
> org.apache.geode.test.junit.rules.serializable.SerializableExternalResource$1.evaluate(SerializableExternalResource.java:38)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at
>
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at
>
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at
>
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
> at
>
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
> at
>
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
> at
>
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
> at
>
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
> at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
>
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:566)
> at
>
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
> at
>
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
> at
>
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
> at
>
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
> at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
> at
>
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
> at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
>
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:566)
> at
>
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
> a

TimeIntegrationTest is flaky

2020-08-12 Thread Kirk Lund
Since this is a new test, can we please prioritize fixing it?

https://issues.apache.org/jira/browse/GEODE-8379

java.lang.AssertionError:
Expecting:
 <0L>
to be greater than:
 <0L>
at
org.apache.geode.redis.internal.executor.server.TimeIntegrationTest.timeCommandRespondsWIthTwoValues(TimeIntegrationTest.java:57)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:566)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at
org.apache.geode.test.junit.rules.serializable.SerializableExternalResource$1.evaluate(SerializableExternalResource.java:38)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:566)
at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
at
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:566)
at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175)
at
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157)
at
org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
at
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
at
org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at
org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
at java.lang.Thread.run(Thread.java:834)


Re: Flaky test caused by missing JDK dependency

2020-07-17 Thread Kirk Lund
Closing out this discussion thread about GEODE-6183:

We believe that the machine performed an update of Java during the test.
This caused the tools.jar to be unavailable only while this test was
executing. There are many other tests that use the Attach API which passed
in this overall CI job suggesting that it was only a momentary problem.

I'm not sure how easy or practical it is to turn off Java updates on
instances running CI jobs. I'll leave that to others to decide.

On Wed, Jul 8, 2020 at 1:30 PM Kirk Lund  wrote:

> The Attach API is optional for Users running the product.
>
> The Attach API is required to compile the classes that use the Attach API
> and to run tests that cover this feature (such as "--pid").
>
> On Wed, Jul 8, 2020 at 12:11 PM Anthony Baker  wrote:
>
>> I thought we made the dependency on the Attach API optional when we added
>> support for JDK 11?
>>
>> Anthony
>>
>>
>> > On Jul 8, 2020, at 10:17 AM, Kirk Lund  wrote:
>> >
>> > To transition away from Attach API, the community needs a proposal to
>> do so
>> > and we'll need to deprecate the GFSH options that depend on Attach API
>> such
>> > as "--pid" in "status server --pid 20938". Even then we're looking at a
>> > minimum of one major release before we can remove options after they are
>> > deprecated.
>> >
>> > We haven't had a major release in 4+ years so don't hold your breath! :)
>> >
>> > On Wed, Jul 8, 2020 at 9:59 AM Sean Goller  wrote:
>> >
>> >> The Liberica JDK does not include the Attach API. I'm investigating
>> why.
>> >> Given the inherent insecurity of this API, I recommend we transition
>> away
>> >> from using it.
>> >> 
>> >> From: Kirk Lund 
>> >> Sent: Monday, July 6, 2020 10:36 AM
>> >> To: dev@geode.apache.org 
>> >> Subject: Flaky test caused by missing JDK dependency
>> >>
>> >> CI Failure:
>> >> LocatorLauncherRemoteFileIntegrationTest.startDeletesStaleControlFiles
>> >> failed with ConditionTimeoutException
>> >>
>> >>
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-6183data=02%7C01%7Cbakera%40vmware.com%7Cdb5c3b93c1994223ff8b08d82362e699%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637298254971916697sdata=g67yHspVjXA8pJjp0shhYf7fZWltB7EexUXJ6sck8F8%3Dreserved=0
>> >>
>> >> I've debugged the latest occurrence of GEODE-6183 (intermittent failure
>> >> CI):
>> >>
>> >>
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fconcourse.apachegeode-ci.info%2Fteams%2Fmain%2Fpipelines%2Fapache-support-1-13-main%2Fjobs%2FWindowsCoreIntegrationTestOpenJDK11%2Fbuilds%2F34data=02%7C01%7Cbakera%40vmware.com%7Cdb5c3b93c1994223ff8b08d82362e699%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637298254971916697sdata=bACP7X6c%2By%2FtzyN6S65UEJ0xWRYxgEhM1KyvlYaYbSU%3Dreserved=0
>> >>
>> >> The underlying cause is a missing dependency: the Attach API. In the
>> Oracle
>> >> JDK, the Attach API is found in the JAVA_HOME/lib/tools.jar. In some
>> JDKs,
>> >> including LibericaJDK, there may not be a tools.jar or it may be
>> missing
>> >> from our image of specific JDKs or a specific OS. I confirmed that the
>> >> Attach API is actually inside a different .jar on some Mac releases of
>> the
>> >> JDK.
>> >>
>> >> Other than JDK differences, I'm not sure why tools.jar would
>> intermittently
>> >> be missing from our testing image, but that's definitely the cause of
>> >> WindowsCoreIntegrationTestOpenJDK11 failing. I've reviewed a couple
>> other
>> >> older runs and it was the same intermittent cause of failure.
>> >>
>> >> Does anyone know if LibericaJDK includes tools.jar or the Attach API?
>> >>
>> >> Does anyone know how to verify that our images all have tools.jar or
>> its
>> >> equivalent?
>> >>
>> >> java.util.ServiceConfigurationError:
>> >> com.sun.tools.attach.spi.AttachProvider: Provider
>> >> sun.tools.attach.WindowsAttachProvider not found
>> >> at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:588)
>> >> at
>> >>
>> >>
>> java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.nextProviderClass(ServiceLoader.java:1211)
>> >> at
>> &

Re: Flaky test caused by missing JDK dependency

2020-07-08 Thread Kirk Lund
The Attach API is optional for Users running the product.

The Attach API is required to compile the classes that use the Attach API
and to run tests that cover this feature (such as "--pid").

On Wed, Jul 8, 2020 at 12:11 PM Anthony Baker  wrote:

> I thought we made the dependency on the Attach API optional when we added
> support for JDK 11?
>
> Anthony
>
>
> > On Jul 8, 2020, at 10:17 AM, Kirk Lund  wrote:
> >
> > To transition away from Attach API, the community needs a proposal to do
> so
> > and we'll need to deprecate the GFSH options that depend on Attach API
> such
> > as "--pid" in "status server --pid 20938". Even then we're looking at a
> > minimum of one major release before we can remove options after they are
> > deprecated.
> >
> > We haven't had a major release in 4+ years so don't hold your breath! :)
> >
> > On Wed, Jul 8, 2020 at 9:59 AM Sean Goller  wrote:
> >
> >> The Liberica JDK does not include the Attach API. I'm investigating why.
> >> Given the inherent insecurity of this API, I recommend we transition
> away
> >> from using it.
> >> 
> >> From: Kirk Lund 
> >> Sent: Monday, July 6, 2020 10:36 AM
> >> To: dev@geode.apache.org 
> >> Subject: Flaky test caused by missing JDK dependency
> >>
> >> CI Failure:
> >> LocatorLauncherRemoteFileIntegrationTest.startDeletesStaleControlFiles
> >> failed with ConditionTimeoutException
> >>
> >>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-6183data=02%7C01%7Cbakera%40vmware.com%7Cdb5c3b93c1994223ff8b08d82362e699%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637298254971916697sdata=g67yHspVjXA8pJjp0shhYf7fZWltB7EexUXJ6sck8F8%3Dreserved=0
> >>
> >> I've debugged the latest occurrence of GEODE-6183 (intermittent failure
> >> CI):
> >>
> >>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fconcourse.apachegeode-ci.info%2Fteams%2Fmain%2Fpipelines%2Fapache-support-1-13-main%2Fjobs%2FWindowsCoreIntegrationTestOpenJDK11%2Fbuilds%2F34data=02%7C01%7Cbakera%40vmware.com%7Cdb5c3b93c1994223ff8b08d82362e699%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637298254971916697sdata=bACP7X6c%2By%2FtzyN6S65UEJ0xWRYxgEhM1KyvlYaYbSU%3Dreserved=0
> >>
> >> The underlying cause is a missing dependency: the Attach API. In the
> Oracle
> >> JDK, the Attach API is found in the JAVA_HOME/lib/tools.jar. In some
> JDKs,
> >> including LibericaJDK, there may not be a tools.jar or it may be missing
> >> from our image of specific JDKs or a specific OS. I confirmed that the
> >> Attach API is actually inside a different .jar on some Mac releases of
> the
> >> JDK.
> >>
> >> Other than JDK differences, I'm not sure why tools.jar would
> intermittently
> >> be missing from our testing image, but that's definitely the cause of
> >> WindowsCoreIntegrationTestOpenJDK11 failing. I've reviewed a couple
> other
> >> older runs and it was the same intermittent cause of failure.
> >>
> >> Does anyone know if LibericaJDK includes tools.jar or the Attach API?
> >>
> >> Does anyone know how to verify that our images all have tools.jar or its
> >> equivalent?
> >>
> >> java.util.ServiceConfigurationError:
> >> com.sun.tools.attach.spi.AttachProvider: Provider
> >> sun.tools.attach.WindowsAttachProvider not found
> >> at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:588)
> >> at
> >>
> >>
> java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.nextProviderClass(ServiceLoader.java:1211)
> >> at
> >>
> >>
> java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNextService(ServiceLoader.java:1220)
> >> at
> >>
> >>
> java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNext(ServiceLoader.java:1264)
> >> at java.base/java.util.ServiceLoader$2.hasNext(ServiceLoader.java:1299)
> >> at java.base/java.util.ServiceLoader$3.hasNext(ServiceLoader.java:1384)
> >> at
> >>
> >>
> jdk.attach/com.sun.tools.attach.spi.AttachProvider.providers(AttachProvider.java:258)
> >> at
> >>
> >>
> jdk.attach/com.sun.tools.attach.VirtualMachine.list(VirtualMachine.java:144)
> >> at
> >>
> >>
> org.apache.geode.internal.process.AttachProcessUtils.isProcessAlive(AttachProcessUtils.java:35)
> >> at
> >>
> >>
> org.apache.geode.internal.process.ProcessUtils.isProcessAlive(ProcessUtils.java:99)
> >> at
> >>
> >>
> org.apache.geode.internal.process.lang.AvailablePid.findAvailablePid(AvailablePid.java:117)
> >> at
> >>
> >>
> org.apache.geode.distributed.LauncherIntegrationTestCase.setUpAbstractLauncherIntegrationTestCase(LauncherIntegrationTestCase.java:92)
> >>
>
>


Re: Request for wiki permission

2020-07-08 Thread Kirk Lund
Ok, you should be good to go now, Eric! I added you to Geode AND gave you
write permissions.

On Wed, Jul 8, 2020 at 1:24 PM Kirk Lund  wrote:

> I see what's missing!
>
> On Wed, Jul 8, 2020 at 1:04 PM Eric Shu  wrote:
>
>> Here is what I have when I search the name with my profile:
>>
>>
>> https://cwiki.apache.org/confluence/users/viewuserprofile.action?username=eshu
>>
>> Personal
>> Full NameEric Shu
>> Email
>> Phone
>> IM
>> Website
>>
>> Not sure what else I supposedly to do.
>>
>> Regards,
>> Eric
>> 
>> From: Kirk Lund 
>> Sent: Wednesday, July 8, 2020 10:44 AM
>> To: dev@geode.apache.org 
>> Subject: Re: Request for wiki permission
>>
>> Hi Eric,
>>
>> I can't find the "eshu" account on
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fdata=02%7C01%7Ceshu%40vmware.com%7C42904c454b304d2a248708d82366aa44%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637298271131613202sdata=dHngozyOCrQmZNTg%2FhebEUZxmVOleGDWPF3oQ2yytJc%3Dreserved=0.
>> Please make
>> sure you've created the account on that server first. Then I can add the
>> edit permissions for Geode.
>>
>> Thanks,
>> Kirk
>>
>> On Wed, Jul 8, 2020 at 9:58 AM Eric Shu  wrote:
>>
>> > Hi,
>> >
>> > I am trying to comment on a wiki doc and need permissions for
>> Confluence.
>> > Please grant it.
>> >
>> > UserName: eshu
>> >
>> > Regards,
>> > Eric
>> >
>>
>


Re: Request for wiki permission

2020-07-08 Thread Kirk Lund
I see what's missing!

On Wed, Jul 8, 2020 at 1:04 PM Eric Shu  wrote:

> Here is what I have when I search the name with my profile:
>
>
> https://cwiki.apache.org/confluence/users/viewuserprofile.action?username=eshu
>
> Personal
> Full NameEric Shu
> Email
> Phone
> IM
> Website
>
> Not sure what else I supposedly to do.
>
> Regards,
> Eric
> 
> From: Kirk Lund 
> Sent: Wednesday, July 8, 2020 10:44 AM
> To: dev@geode.apache.org 
> Subject: Re: Request for wiki permission
>
> Hi Eric,
>
> I can't find the "eshu" account on
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fdata=02%7C01%7Ceshu%40vmware.com%7C42904c454b304d2a248708d82366aa44%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637298271131613202sdata=dHngozyOCrQmZNTg%2FhebEUZxmVOleGDWPF3oQ2yytJc%3Dreserved=0.
> Please make
> sure you've created the account on that server first. Then I can add the
> edit permissions for Geode.
>
> Thanks,
> Kirk
>
> On Wed, Jul 8, 2020 at 9:58 AM Eric Shu  wrote:
>
> > Hi,
> >
> > I am trying to comment on a wiki doc and need permissions for Confluence.
> > Please grant it.
> >
> > UserName: eshu
> >
> > Regards,
> > Eric
> >
>


Re: Request for wiki permission

2020-07-08 Thread Kirk Lund
Hi Eric,

I can't find the "eshu" account on https://cwiki.apache.org/. Please make
sure you've created the account on that server first. Then I can add the
edit permissions for Geode.

Thanks,
Kirk

On Wed, Jul 8, 2020 at 9:58 AM Eric Shu  wrote:

> Hi,
>
> I am trying to comment on a wiki doc and need permissions for Confluence.
> Please grant it.
>
> UserName: eshu
>
> Regards,
> Eric
>


Re: about Liberica JDK

2020-07-08 Thread Kirk Lund
In the future, I think we should discuss and propose changing the JDK on
this dev-list before making the changes.

On Tue, Jul 7, 2020 at 9:01 AM Anthony Baker  wrote:

> Liberica is an OpenJDK distribution like AdoptOpenJDK, Oracle, RedHat,
> Amazon, Azul, etc.  I have yet to find a behavioral difference between the
> distributions—it’s mostly about ease of acquiring binaries and LTS support.
> Just for simplicity I would prefer to use a single JDK distribution across
> versions in our CI pipelines.
>
>
> Anthony
>
>
> > On Jul 7, 2020, at 2:21 AM, Alberto Bustamante Reyes
>  wrote:
> >
> > Hi devs,
> >
> > I have seen in develop branch this commit that changes openjdk by
> Liberica JDK (
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull%2F5312data=02%7C01%7Cbakera%40vmware.com%7C92aa27c47a164abdd6b808d822573a33%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637297105319626477sdata=X3V4jcQ7DWEVtJuUaXOQy%2F0l3xri9g4a0%2BfflLmui1g%3Dreserved=0
> ), although it was reverted later so I suppose there are still issues to be
> solved.
> >
> > I didn't know Liberica and I'm curious about the change. Why is this
> change being implemented?
> >
> > BR/
> >
> > Alberto B.
> >
>
>


Re: Flaky test caused by missing JDK dependency

2020-07-08 Thread Kirk Lund
To transition away from Attach API, the community needs a proposal to do so
and we'll need to deprecate the GFSH options that depend on Attach API such
as "--pid" in "status server --pid 20938". Even then we're looking at a
minimum of one major release before we can remove options after they are
deprecated.

We haven't had a major release in 4+ years so don't hold your breath! :)

On Wed, Jul 8, 2020 at 9:59 AM Sean Goller  wrote:

> The Liberica JDK does not include the Attach API. I'm investigating why.
> Given the inherent insecurity of this API, I recommend we transition away
> from using it.
> ____
> From: Kirk Lund 
> Sent: Monday, July 6, 2020 10:36 AM
> To: dev@geode.apache.org 
> Subject: Flaky test caused by missing JDK dependency
>
> CI Failure:
> LocatorLauncherRemoteFileIntegrationTest.startDeletesStaleControlFiles
> failed with ConditionTimeoutException
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-6183data=02%7C01%7Csgoller%40vmware.com%7Cbcfa660e6577442247ee08d821d3283c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637296538083290882sdata=7HMMMxvHvdlJf8Awc9zAl7Xr9291Ep0HMoto5%2F%2BgUys%3Dreserved=0
>
> I've debugged the latest occurrence of GEODE-6183 (intermittent failure
> CI):
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fconcourse.apachegeode-ci.info%2Fteams%2Fmain%2Fpipelines%2Fapache-support-1-13-main%2Fjobs%2FWindowsCoreIntegrationTestOpenJDK11%2Fbuilds%2F34data=02%7C01%7Csgoller%40vmware.com%7Cbcfa660e6577442247ee08d821d3283c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637296538083290882sdata=x1FIsAQOyZd%2Bvl%2BRRkzI8yGeFW%2B04sMuO4JM%2F67vu8w%3Dreserved=0
>
> The underlying cause is a missing dependency: the Attach API. In the Oracle
> JDK, the Attach API is found in the JAVA_HOME/lib/tools.jar. In some JDKs,
> including LibericaJDK, there may not be a tools.jar or it may be missing
> from our image of specific JDKs or a specific OS. I confirmed that the
> Attach API is actually inside a different .jar on some Mac releases of the
> JDK.
>
> Other than JDK differences, I'm not sure why tools.jar would intermittently
> be missing from our testing image, but that's definitely the cause of
> WindowsCoreIntegrationTestOpenJDK11 failing. I've reviewed a couple other
> older runs and it was the same intermittent cause of failure.
>
> Does anyone know if LibericaJDK includes tools.jar or the Attach API?
>
> Does anyone know how to verify that our images all have tools.jar or its
> equivalent?
>
> java.util.ServiceConfigurationError:
> com.sun.tools.attach.spi.AttachProvider: Provider
> sun.tools.attach.WindowsAttachProvider not found
> at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:588)
> at
>
> java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.nextProviderClass(ServiceLoader.java:1211)
> at
>
> java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNextService(ServiceLoader.java:1220)
> at
>
> java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNext(ServiceLoader.java:1264)
> at java.base/java.util.ServiceLoader$2.hasNext(ServiceLoader.java:1299)
> at java.base/java.util.ServiceLoader$3.hasNext(ServiceLoader.java:1384)
> at
>
> jdk.attach/com.sun.tools.attach.spi.AttachProvider.providers(AttachProvider.java:258)
> at
>
> jdk.attach/com.sun.tools.attach.VirtualMachine.list(VirtualMachine.java:144)
> at
>
> org.apache.geode.internal.process.AttachProcessUtils.isProcessAlive(AttachProcessUtils.java:35)
> at
>
> org.apache.geode.internal.process.ProcessUtils.isProcessAlive(ProcessUtils.java:99)
> at
>
> org.apache.geode.internal.process.lang.AvailablePid.findAvailablePid(AvailablePid.java:117)
> at
>
> org.apache.geode.distributed.LauncherIntegrationTestCase.setUpAbstractLauncherIntegrationTestCase(LauncherIntegrationTestCase.java:92)
>


Flaky test caused by missing JDK dependency

2020-07-06 Thread Kirk Lund
CI Failure:
LocatorLauncherRemoteFileIntegrationTest.startDeletesStaleControlFiles
failed with ConditionTimeoutException
https://issues.apache.org/jira/browse/GEODE-6183

I've debugged the latest occurrence of GEODE-6183 (intermittent failure CI):
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-support-1-13-main/jobs/WindowsCoreIntegrationTestOpenJDK11/builds/34

The underlying cause is a missing dependency: the Attach API. In the Oracle
JDK, the Attach API is found in the JAVA_HOME/lib/tools.jar. In some JDKs,
including LibericaJDK, there may not be a tools.jar or it may be missing
from our image of specific JDKs or a specific OS. I confirmed that the
Attach API is actually inside a different .jar on some Mac releases of the
JDK.

Other than JDK differences, I'm not sure why tools.jar would intermittently
be missing from our testing image, but that's definitely the cause of
WindowsCoreIntegrationTestOpenJDK11 failing. I've reviewed a couple other
older runs and it was the same intermittent cause of failure.

Does anyone know if LibericaJDK includes tools.jar or the Attach API?

Does anyone know how to verify that our images all have tools.jar or its
equivalent?

java.util.ServiceConfigurationError:
com.sun.tools.attach.spi.AttachProvider: Provider
sun.tools.attach.WindowsAttachProvider not found
at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:588)
at
java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.nextProviderClass(ServiceLoader.java:1211)
at
java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNextService(ServiceLoader.java:1220)
at
java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNext(ServiceLoader.java:1264)
at java.base/java.util.ServiceLoader$2.hasNext(ServiceLoader.java:1299)
at java.base/java.util.ServiceLoader$3.hasNext(ServiceLoader.java:1384)
at
jdk.attach/com.sun.tools.attach.spi.AttachProvider.providers(AttachProvider.java:258)
at
jdk.attach/com.sun.tools.attach.VirtualMachine.list(VirtualMachine.java:144)
at
org.apache.geode.internal.process.AttachProcessUtils.isProcessAlive(AttachProcessUtils.java:35)
at
org.apache.geode.internal.process.ProcessUtils.isProcessAlive(ProcessUtils.java:99)
at
org.apache.geode.internal.process.lang.AvailablePid.findAvailablePid(AvailablePid.java:117)
at
org.apache.geode.distributed.LauncherIntegrationTestCase.setUpAbstractLauncherIntegrationTestCase(LauncherIntegrationTestCase.java:92)


Re: Question about gateway sender stopped and memory consumption

2020-07-02 Thread Kirk Lund
I would have expected unsent events to be stored in a queue that is backed
by a persistent region or something on disk. If that's not currently true,
then it seems like a good direction might be to make tmpDroppedEvents use a
durable queue of some sort that overflows to disk.



On Thu, Jul 2, 2020 at 10:33 AM Alberto Gomez 
wrote:

> Hi,
>
> We have observed that when a gateway sender is stopped in a site, all the
> events received while it is stopped are stored in the
> 'AbstractGatewaySender.tmpDroppedEvents' queue of the primary sender. The
> elements of this queue are not removed from this queue until the sender is
> started back again.
>
> This behavior implies that if the gateway sender is stopped for a long
> time, there is a risk of heap exhaustion in the members hosting primary
> senders.
>
> Under split brain situations, if lasting long enough, there could be heap
> exhaustion problems in servers due to the memory used by the gateway sender
> queues, even if overflow to disk is used -given that part of the event is
> always stored in memory.
> For those situations we had thought about stopping gateway senders when
> the memory used by the gateway sender queues reached a certain memory
> threshold. But according to the above, stopping the gateway senders would
> only make things worse.
>
> Would it make sense for the gateway sender not to store the received
> events in tmpDroppedEvents while it is stopped?
>
> Any suggestion on how to approach the problem of heap exhaustion due to
> the growth of gateway sender queues in long lasting split brain situations?
>
> Thanks in advance,
>
> Alberto G.
>
>
>


Re: Us vs Docker vs Gradle vs JUnit

2020-07-01 Thread Kirk Lund
I'm not a big fan of forking the Docker plugin and making it a new Geode
submodule. This approach kind of flies in the face of the intentions of OSS
in general. For example, we want folks contributing to Apache Geode rather
than forking Geode to create their own new project while never giving back
to this project.

If the original Docker plugin project refuses to maintain or take on new
contributors then some of us should help lead the creation of a new Docker
plugin repo in Github that is independent of Apache Geode. This way it
becomes a new living Docker plugin project that many contributors can
become involved in. It also becomes valuable to more projects than just
one. I've seen this happen with a number of OSS projects in which the
original maintainer of a project disappears or leaves and I think this is
generally accepted as the best approach for reviving a dead project that no
new committers can be added to.

On Tue, Jun 30, 2020 at 11:30 AM Jacob Barrett  wrote:

> All,
>
> We are in a bit of a pickle. As you recall from a few years back in an
> effort to both stabilize and parallelize integration, distributed and other
> integration/system like test we use Docker. Many of the tests reused the
> same ports for services which cause them to fail or interact with each
> other when run in parallel. By using Docker to isolate a test we put a
> bandage on that issue. The plugin overrides Gradle’s default forked runner
> by starting the runners in Docker containers and marshaling the execution
> parameters to those Dockerized runners.
>
> The Docker test plugin is effectively unmaintained. The author seems
> content on keeping it compatible with Gradle 4. We forked it to work with
> Gradle 5 and various other issues we have hit over the years. We have
> shared patches in the past with little luck in having them merged and still
> its only compatible with Gradle 4.8 at best. I spent some time trying to
> port it to Gradle 6 but its going to be a larger undertaking given that
> Gradle 6 is fully Java modules compatible. They added new members
> throughout to handle modules in addition to class paths.
>
> Long story short because our tests can’t be parallelized without a
> container system we are stuck. We can’t go to JUnit 5 without updating
> Docker plugin (potentially minor changes). We can’t go to Gradle 6 without
> updating the Docker plugin (potentially huge changes). Being stuck is not a
> good place. I see two paths out of this:
>
> 1) We buckle down and fix the tests so they can run in parallel via the
> normal forking mechanism of Gradle. I know some effort has been expended in
> this by using our new rules for starting servers. We should need to go
> further.
>
> 2) Fully invest in the Docker plugin. We would need to fork this off as a
> fully maintain sub-project of Geode. We would need to add to it support for
> both Gradle 6 and JUnit 5.
>
> My money is on fixing the tests. It is clear, at least from my exhaustive
> searching, nobody in the Gradle and JUnit communities are isolating their
> tests with containers. They are creating containers to host service for
> system level testing, see Testcontainers project. The tests themselves run
> in the local kernel space (not in container).
>
> We made this push in the C++ and .NET tests, a much smaller set of tests,
> and it works great. The framework takes care to create clusters that do not
> interact with each other on the same host. Some things in Geode make this
> harder than others, like http service not support ephemeral port selection,
> and gfsh not providing machine readable output about ephemeral port
> selections. We use port knocking to prevent the OS from assigning the port
> ephemerally to another process. The framework knocks, opens and then
> closes, all the ports it needs for the server/locator services and starts
> them explicitly on those ports. Because of port recycling rules in the OS
> another ephemeral port request won’t get those ports for some time after
> they are closed. It's not perfect but it works. Fixing Geode to support
> ephemeral port selection and a better reporting mechanisms for those port
> choices would be more ideal. Also, we only start services necessary for the
> test, like don’t start the http ports if they aren’t going to be used.
>
> I would love some feedback and thoughts on this issue. Does anyone else
> see a different path forward?
>
> -Jake
>
>
>
>
>
>


  1   2   3   4   5   6   7   8   9   10   >