Re: [DISCUSS] CI improvements

2017-11-02 Thread Sean Goller
Given the length of time precheckin seems to run, would it make sense to
break it up?

-Sean.

On Thu, Nov 2, 2017 at 11:49 AM, Dan Smith  wrote:

> Looks good. Should we go ahead and change this to run precheckin instead of
> build?
>
> -Dan
>
> On Thu, Nov 2, 2017 at 9:53 AM, Anthony Baker  wrote:
>
> > If you’d like to check this out, here’s the PR containing the pipeline
> and
> > job scripts:
> > https://github.com/apache/geode/pull/1006
> >
> > And the pipeline itself:
> > https://concourse.apachegeode-ci.info
> >
> > There are three pipelines defined:
> >
> > - develop:  runs `gradle build`.  Can be extended to include other
> > precheckin tests based on feedback.
> > - docker-images: builds the container used for the develop pipeline.
> > - meta: watches for changes to the pipeline files and automatically
> > updates the runtime pipelines.
> >
> > Authentication is integrated with GitHub.  If you want the ability to
> > manually stop/start jobs please request on the dev@g.a.o mailing list
> > (same as for Jenkins) and include your GitHub id.
> >
> > What do you think?
> >
> > Anthony
> >
> > > On Oct 6, 2017, at 7:08 AM, Anthony Baker  wrote:
> > >
> > > Hi all,
> > >
> > > I’d like to propose the following that we switch our continuous
> > > integration (CI) system from Jenkins [1] to Concourse [2].  I suggest
> > > this because we continue to experience a significant number of
> > > environmental-related test failures.
> > >
> > > These issues include CPU interference from other Jenkins jobs on the
> > > same host, running out of disk space, port conflicts, and other
> > > gremlins.  The net effect is that we are only getting 1-2 successful
> > > builds per month.  Certainly not all test failures can be traced back
> > > to environmental issues.  However, internal testing on isolated VM’s
> > > shows a combined success rate of about 3X higher compared to ASF
> > > Jenkins for the same tests.  This is still definitely NotAwesome, but
> > > removing environmental factors will let us focus on stabilizing flaky
> > > tests.
> > >
> > > Concourse is an Apache-licensed open source CI system based on
> > > pipelines.  The pipelines are defined in a YML file containing job
> > > definitions—inputs, outputs, resources, and tasks.  A task is simply a
> > > bash script that returns 0/1 for success/failure.  A web UI displays
> > > build status.  Importantly, each job runs inside an isolated
> > > container.  The containers are load-balanced across a pool of workers.
> > > For an example of a build pipeline, see [3] for the pipeline used to
> > > build concourse itself.
> > >
> > > A Concourse environment is deployed and managed in cloud environments
> > > through bosh [4].  Pivotal has agreed to donate AWS and/or GCP compute
> > > and storage resources as well as manage the infrastructure.  These
> > > project resources would be available for use by all committers and
> > > community members regardless of corporate affiliations.  Note that
> > > AFAIK there is no explicit requirement to host CI on ASF
> > > infrastructure—unlike for critical project resources such as source
> > > code, mailing lists, and issue tracking.
> > >
> > > The source for the pipeline and job scripts would reside within the
> > > geode-* repos.  Geode committers would be able to modify those, same
> > > as with our .travis.yml scripts.  All test results and build artifacts
> > > would be publicly viewable just like with our Jenkins build output
> > > today.  Requests for admin assistance would go through the dev@geode
> > > mailing list.
> > >
> > > Thoughts?  As a first step we could run both CI systems side-by-side
> > > and see how the Concourse approach works for our project.
> > >
> > > Thanks,
> > > Anthony
> > >
> > >
> > > [1] https://builds.apache.org/job/Geode-nightly/
> > > [2] https://concourse.ci
> > > [3] https://ci.concourse.ci
> > > [4] https://bosh.io
> >
> >
>


Re: [DISCUSS] CI improvements

2017-11-02 Thread Sean Goller
precheckin is literally './gradlew build :geode-assembly:acceptanceTest
integrationTest distributedTest flakyTest' :)

-S.

On Thu, Nov 2, 2017 at 1:10 PM, Dan Smith <dsm...@pivotal.io> wrote:

> On Thu, Nov 2, 2017 at 11:58 AM, Sean Goller <sgol...@pivotal.io> wrote:
>
> > Given the length of time precheckin seems to run, would it make sense to
> > break it up?
> >
> > -Sean.
> >
>
> Sure, as long as we don't miss anything :)
>
> -Dan
>


Concourse pipeline

2017-12-01 Thread Sean Goller
Hi everyone,
  If you haven't already heard, we have a new concourse pipeline
infrastructure up and running at https://concourse.apachegeode-ci.info/ .
When these jobs fail, email will be sent to the dev list, so be aware! :)

-Sean.


Re: Concourse pipeline

2017-12-01 Thread Sean Goller
Mail should be sent out only on failures, so I would hope they'd be less
frequent than Travis. I think Travis also processes PRs which contributes
to the noise.

-S.

On Fri, Dec 1, 2017 at 4:26 PM, Greg Chase <g...@gregchase.com> wrote:

> I wonder - do we want a separate email alias for these messages if they are
> going to be even more frequent than Travis?
>
> On Fri, Dec 1, 2017 at 3:24 PM, Sean Goller <sgol...@pivotal.io> wrote:
>
> > Hi everyone,
> >   If you haven't already heard, we have a new concourse pipeline
> > infrastructure up and running at https://concourse.apachegeode-ci.info/
> .
> > When these jobs fail, email will be sent to the dev list, so be aware! :)
> >
> > -Sean.
> >
>


Concourse infrastructure

2018-05-14 Thread Sean Goller
Currently the concourse infrastructure is suffering from massive internal
networking issues, we are working to resolve this as soon as possible. I'll
update the community as we progress on the repair.


Re: Concourse infrastructure

2018-05-14 Thread Sean Goller
Update: The infrastructure is still messed up. I've been working on
bringing up a replacement infrastructure, but it's taking longer than
anticipated. Hopefully I'll be able to finish up in the morning.

-S.

On Mon, May 14, 2018 at 9:59 AM, Sean Goller <sgol...@pivotal.io> wrote:

> Currently the concourse infrastructure is suffering from massive internal
> networking issues, we are working to resolve this as soon as possible. I'll
> update the community as we progress on the repair.
>


Re: Concourse infrastructure

2018-05-15 Thread Sean Goller
Update: We are back! Once a build makes it all the way through develop I'll
enable the metrics pipeline.

On Mon, May 14, 2018 at 9:12 PM, Sean Goller <sgol...@pivotal.io> wrote:

> Update: The infrastructure is still messed up. I've been working on
> bringing up a replacement infrastructure, but it's taking longer than
> anticipated. Hopefully I'll be able to finish up in the morning.
>
> -S.
>
> On Mon, May 14, 2018 at 9:59 AM, Sean Goller <sgol...@pivotal.io> wrote:
>
>> Currently the concourse infrastructure is suffering from massive internal
>> networking issues, we are working to resolve this as soon as possible. I'll
>> update the community as we progress on the repair.
>>
>
>


Re: Next release: 1.4.0

2018-01-05 Thread Sean Goller
The release pipeline is up:
https://concourse.apachegeode-ci.info/teams/main/pipelines/release-1.4.0

-Sean.

On Fri, Jan 5, 2018 at 10:00 AM, Anthony Baker  wrote:

> +1
>
> It should be pretty easy to clone the current pipeline for the 1.4.0
> release branch.
>
> I’ll plan to update the Jenkins jobs to run the `build` and
> `updateArchives` tasks since those still have value.
>
> Anthony
>
>
> > On Jan 4, 2018, at 5:03 PM, Alexander Murmann 
> wrote:
> >
> > The Concourse pipeline seems much more reliable at this point and the
> > pipelines should be providing equivalent test coverage. Given that, are
> > there any reasons to not deprecate Jenkins?
> >
> > On Thu, Jan 4, 2018 at 4:55 PM, Jason Huynh  wrote:
> >
> >> Hi Swapnil,
> >>
> >> GEODE-4140 was just marked for 1.4.  I think part of GEODE-4140 should
> be
> >> fixed because the process.ClusterConfigurationNotAvailableException
> should
> >> probably be reinstated.  If others don't think it's needed then feel
> free
> >> to remove the fix tag.
> >>
> >> -Jason
> >>
> >> On Thu, Jan 4, 2018 at 4:38 PM Dan Smith  wrote:
> >>
> >>> Our process up to this point has been to not ship until the jenkins
> >> builds
> >>> on the release branch pass. We've been experimenting with concourse in
> >>> parallel with jenkins, but the jenkins builds on develop at least are
> >> still
> >>> pretty messy. How are we going to ship this release? Should both be
> >>> passing?
> >>>
> >>> -Dan
> >>>
> >>> On Thu, Jan 4, 2018 at 4:23 PM, Swapnil Bawaskar  >
> >>> wrote:
> >>>
>  Since all the issues tagged for 1.4.0 release
>    rapidView=92=GEODE=planning=
>  GEODE-3688=visible=12341842>
>  have been addressed, I went ahead and created a release branch for
> >> 1.4.0.
> 
>  Can someone please update the concourse pipelines to pick up this
> >> release
>  branch?
> 
>  Thanks!
> 
> 
>  On Tue, Nov 28, 2017 at 1:58 PM Swapnil Bawaskar <
> sbawas...@pivotal.io
> >>>
>  wrote:
> 
> > Well, making sure that the JIRA's status is up-to-date and removing
> >> the
> > 1.4.0 version tag if the fix can wait for a later release.
> >
> > On Tue, Nov 28, 2017 at 12:22 PM Michael William Dodge <
>  mdo...@pivotal.io>
> > wrote:
> >
> >> What sort of update? I know that GEODE-4010 has a PR that's awaiting
> >> review and merge.
> >>
> >> Sarge
> >>
> >>> On 28 Nov, 2017, at 10:03, Swapnil Bawaskar  >>>
> >> wrote:
> >>>
> >>> I would like to volunteer as a release manager.
> >>> Currently there are 14 issues that are marked for 1.4.0. If you
> >> are
> >> working
> >>> on any of these, can you please update the JIRA?
> >>>
> >> https://issues.apache.org/jira/secure/RapidBoard.jspa?
>  rapidView=92=GEODE=planning=
>  GEODE-3688=visible=12341842
> >>>
> >>> Thanks!
> >>>
> >>> On Tue, Nov 28, 2017 at 9:42 AM Anthony Baker 
> >> wrote:
> >>>
>  Bump.  Any volunteers?  If not, I’ll do this.
> 
>  Anthony
> 
> 
> > On Nov 22, 2017, at 1:48 PM, Anthony Baker 
>  wrote:
> >
> > We released Geode 1.3.0 at the end of October.  Our next release
>  will
> >> be
>  1.4.0.  Questions:
> >
> > 1) Who wants to volunteer as a release manager?
> > 2) What do we want to include in the release?
> > 3) When do we want to do this?
> >
> > IMO, let's should shoot for an early Dec release.
> >
> > Anthony
> >
> 
> 
> >>
> >>
> 
> >>>
> >>
>
>


Re: Logging in to concourse.apachegeode-ci.info

2018-02-22 Thread Sean Goller
Yeah, the geode pipelines are completely public. You don't need to log into
concourse to see anything, only to pause, kill or start jobs. Any changes
made to the pipeline is done through the git repo, and those changes will
be automatically pushed to concourse when merged.

-S.

On Wed, Feb 21, 2018 at 10:28 AM, Kirk Lund  wrote:

> I just assumed that I was *supposed* to login to that pipeline. Sounds like
> I probably don't need to login(?), but that's really not very obvious when
> looking at the pipeline gui. Nevermind... I don't need to kill or modify
> anything. I'm just used to logging into any concourse pipeline in order to
> view results.
>
> On Wed, Feb 21, 2018 at 9:40 AM, Anthony Baker  wrote:
>
> > Any committer can ask for login privileges (just like on Jenkins).  That
> > only gives you the ability to retrigger a job or kill a job.
> >
> > Is that what you want?  If you want to modify job definitions you can do
> > that by committing changes to [1].
> >
> > Anthony
> >
> > [1] https://github.com/apache/geode/tree/develop/ci <
> > https://github.com/apache/geode/tree/develop/ci>
> >
> >
> > > On Feb 21, 2018, at 9:24 AM, Dan Smith  wrote:
> > >
> > > I don't think you can log in to that pipeline. What are you wanting to
> do
> > > to it?
> > >
> > > -Dan
> > >
> > > On Wed, Feb 21, 2018 at 9:15 AM, Kirk Lund  wrote:
> > >
> > >> How are we supposed to login to
> > >> https://concourse.apachegeode-ci.info/teams/main/login?
> > >>
> > >> I'm trying to use the "login with GitHub" option (I think this worked
> > for
> > >> me before) but it keeps failing in a couple different ways... latest
> > >> failure was "failed to verify token", previous one was "verification
> > >> failed"
> > >>
> > >> Thanks,
> > >> Kirk
> > >>
> >
> >
>


Pipeline instability

2018-07-25 Thread Sean Goller
A major refactor is going in shortly so expect turbulence for the next hour
or so.

-Sean.


Re: Moving from gradle to maven

2018-07-18 Thread Sean Goller
This is a non starter without a maven equivalent of the gradle dockerized
plugin. Switching to maven without that will mean longer testing times,
which I feel is unacceptable.

So far I've found this reference on stack overflow (
https://stackoverflow.com/questions/36808351/running-junit-tests-in-parallel-with-docker
) to a homebuilt solution, but I'm unsure how replicable it is.

-Sean.

On Wed, Jul 18, 2018 at 1:21 PM Jens Deppe  wrote:

> +1 For moving to maven.
>
> I think the blog Kirk linked hits all the relevant pain points.
>
> This is the third time we've done significant Gradle work and every time it
> is painful. It's also probably never going to get any better.
>
> For myself, Gradle certainly feels like a lot of magic happening under the
> covers - it feels like it requires a fair bit of mental effort to
> understand and distinguish configuration phases and execution phases and
> which parts fit into which phase. Maven has its own magic, but is
> definitely more linear and obvious in it's execution steps.
>
> --Jens
>
> On Wed, Jul 18, 2018 at 12:27 PM Patrick Rhomberg 
> wrote:
>
> > +1 to correcting our current broken gradle build.
> >
> >
> > The fault, dear Brutus, is not in our [tools], but in ourselves.
> >
> > I think the root pain point is that our dependency tree is neither
> explicit
> > nor correct in several places.  I have myself had frequent issues
> > surrounding our Protobuf and OQLLexer classes requiring a command-line
> > build and re-import.  It's also why, after we bumped gradle versions, we
> > were prone to errors when building in parallel.
> >
> > Correctly documenting and making explicit the gradle build dependencies
> is
> > something that I am meaning to look into soon, but I am currently looking
> > into improving our pipelines and metrics scripting.
> >
> > On Wed, Jul 18, 2018 at 12:04 PM, Udo Kohlmeyer  wrote:
> >
> > > I must agree, the fact that IntelliJ cannot handle the current project
> > > structure, is that I believe that we have a complicated project
> > structure.
> > > Moving to maven would force a more strict project structure.
> > >
> > > I don't mind moving to maven, but I believe that we would have similar
> > > experiences with maven and a complex project structure. I was thinking
> > > would could move to Gradle-Kotlin DSL, but that also would not solve
> the
> > > current structure problem.
> > >
> > > So...  +1 on move to maven OR +1 on refactoring to the current gradle
> > > setup to be less "custom" and maybe a little more rigid.
> > >
> > >
> > > On 7/18/18 11:00, Kirk Lund wrote:
> > >
> > >> Gradle appears to not play well with IntelliJ unless the project is
> > overly
> > >> simple. I don't want to spend my days fighting with tools that don't
> > work
> > >> well together.
> > >>
> > >> Here's an interesting blog article about moving from gradle to maven:
> > >>
> > >> https://blog.philipphauer.de/moving-back-from-gradle-to-maven/
> > >>
> > >> Any other data points or opinions about moving from gradle to maven?
> > >>
> > >>
> > >
> >
>


Major CI changes PR

2018-07-18 Thread Sean Goller
Hi, I've submitted a PR  to make
the pipeline much more fork friendly for ease in testing. As soon as this
PR gets merged, I expect some degree of instability and redeployment of
pipelines, so please be aware of that if you see issues.

-Sean.


Re: [DISCUSS] Apache Geode 1.7.0 release branch created

2018-09-04 Thread Sean Goller
Reverting GEODE-5591 results in code that can produce an infinite loop, so
I don't feel that's a viable option. I feel as though the code treats bind
exceptions as transient occurrences, but my direct experience with them
leads me to the opposite conclusion. I don't believe a long wait time is
going to change the situation, especially since a TCP timeout scenario can
take up to 30 minutes to resolve itself. I believe it is better to fail
fast and hard, so I would suggest either failing immediately or a very
short timeout, say 5 or 10 seconds at most.

On Tue, Sep 4, 2018 at 4:03 PM Nabarun Nag  wrote:

> Currently we have a minor issue in the release branch as pointed out by
> Barry O.
> We will wait till a resolution is figured out for this issue.
>
> Steps:
> 1. create locator
> 2. start server --name=server1 --server-port=40404
> 3. start server --name=server2 --server-port=40405
> 4. create gateway-receiver --member=server1
> 5. create gateway-receiver --member=server2 `This gets stuck for 2 minutes`
>
> Is the 2 minute wait time acceptable? Should we document it? When we revert
> GEODE-5591, this issue does not happen.
>
> Regards
> Nabarun Nag
>
> On Tue, Sep 4, 2018 at 10:50 AM Nabarun Nag  wrote:
>
> > Status Update on release process for 1.7.0
> > - checkPom files are being modified to have version as 1.7.0 instead of
> > 1.8.0-SNAPSHOT
> > - gradle.properties file has been modified to reflect 1.7.0 as the
> version.
> > - Version.java has been reverted to remove all changes corresponding to
> > 1.8.0
> > - CommandInitializer.java has been reverted to remove changes for 1.8.0
> > - LuceneIndexCommandsJUnitTest.java has been modified to change
> > Version.GEODE_180 to GEODE_170
> > - LuceneIndexCommands.java has been modified to change Version.GEODE_180
> > to GEODE_170
> > -TXCommitMessage.java has been modified to change Version.GEODE_180 to
> > GEODE_170
> >
> > I will be getting in touch with the individual developers to verify my
> > changes.
> > The branch will be update once we get a green light on these changes.
> >
> > Still need updates on these tickets:
> >
> > GEODE-5600 - [Patrick Rhomberg]
> > GEODE-5578 - [Robert Houghton]
> > GEODE-5492 - [Robert Houghton]
> > GEODE-5280 - [xiaojian zhou & Biju Kunjummen]
> >
> > These tickets have commits into develop but they are still open with fix
> > version as 1.8.0
> >
> > Regards
> > Nabarun Nag
> >
> >
> >
> > On Fri, Aug 31, 2018 at 3:38 PM Dale Emery  wrote:
> >
> >> I have resolved GEODE-5254
> >>
> >> Dale
> >>
> >> > On Aug 31, 2018, at 3:34 PM, Nabarun Nag  wrote:
> >> >
> >> > Requesting status update on the following JIRA tickets. These tickets
> >> have
> >> > commits into develop against its name but the status is still open /
> >> > unresolved.
> >> >
> >> > GEODE-5600 - [Patrick Rhomberg]
> >> > GEODE-5578 - [Robert Houghton]
> >> > GEODE-5492 - [Robert Houghton]
> >> > GEODE-5280 - [xiaojian zhou & Biju Kunjummen]
> >> > GEODE-5254 - [Dale Emery]
> >> >
> >> > GEODE-4794 - [Sai]
> >> > GEODE-5594 - [Sai]
> >> >
> >> > Regards
> >> > Nabarun Nag
> >> >
> >> >
> >> > On Fri, Aug 31, 2018 at 3:18 PM Nabarun Nag  wrote:
> >> >
> >> >>
> >> >> Please continue using 1.7.0 as a fix version in JIRA till the email
> >> comes
> >> >> in that the 1.7.0 release branch has be cut.
> >> >>
> >> >> Changing the fixed version for the following tickets to 1.7.0 from
> >> 1.8.0
> >> >> as these fixes will be included in the 1.7.0 release
> >> >>
> >> >> GEODE-5671
> >> >> GEODE-5662
> >> >> GEODE-5660
> >> >> GEODE-5652
> >> >>
> >> >> Regards
> >> >> Nabarun Nag
> >> >>
> >> >>
> >> >> On Fri, Aug 31, 2018 at 2:20 PM Nabarun Nag  wrote:
> >> >>
> >> >>> A new feature of get/set cluster config was added as new feature to
> >> gfsh.
> >> >>> This needs to be added to the documentation.
> >> >>> Once this is done, the branch will be ready.
> >> >>>
> >> >>> Regards
> >> >>> Nabarun
> >> >>>
> >> >>>
> >> >>> On Fri, Aug 31, 2018 at 2:15 PM Alexander Murmann <
> >> amurm...@pivotal.io>
> >> >>> wrote:
> >> >>>
> >>  Nabarun, do you still see anything blocking cutting the release at
> >> this
> >>  point?
> >> 
> >>  Maybe we can even get a pipeline going today? 
> >> 
> >>  On Fri, Aug 31, 2018 at 10:38 AM, Sai Boorlagadda <
> >>  sai.boorlaga...@gmail.com
> >> > wrote:
> >> 
> >> > We can go ahead and cut 1.7 with out GEODE-5338 as I don't have
> the
> >>  code
> >> > ready.
> >> >
> >> > GEODE-5594, adds a new flag to enable hostname validation and is
> >>  disabled
> >> > by default so we are good with changes that are already merged and
> >> > documentation for GEODE-5594 is ready merged.
> >> >
> >> > Naba, after the branch is cut we should delete windows jobs from
> the
> >>  branch
> >> > before we create the pipeline for 1.7.
> >> >
> >> > Apologies for holding up the release.
> >> >
> >> > Sai.
> >> >
> >> > On Fri, Aug 31, 2018, 

Re: 2 minute gateway startup time due to GEODE-5591

2018-09-04 Thread Sean Goller
If it's to get the release out, I'm fine with reverting. I don't like it,
but I'm not willing to die on that hill. :)

-S.

On Tue, Sep 4, 2018 at 4:38 PM Dan Smith  wrote:

> Spitting this into a separate thread.
>
> I see the issue. The two minute timeout is the constructor for
> AcceptorImpl, where it retries to bind for 2 minutes.
>
> That behavior makes sense for CacheServer.start.
>
> But it doesn't make sense for the new logic in GatewayReceiver.start() from
> GEODE-5591. That code is trying to use CacheServer.start to scan for an
> available port, trying each port in a range. That free port finding logic
> really doesn't want to have two minutes of retries for each port. It seems
> like we need to rework the fix for GEODE-5591.
>
> Does it make sense to hold up the release to rework this fix, or should we
> just revert it? Have we switched concourse over to using alpine linux,
> which I think was the original motivation for this fix?
>
> -Dan
>
> On Tue, Sep 4, 2018 at 4:25 PM, Dan Smith  wrote:
>
> > Why is it waiting at all in this case? Where is this 2 minute timeout
> > coming from?
> >
> > -Dan
> >
> > On Tue, Sep 4, 2018 at 4:12 PM, Sai Boorlagadda <
> sai.boorlaga...@gmail.com
> > > wrote:
> >
> >> So the issue is that it takes longer to start than previous releases?
> >> Also, is this wait time only when using Gfsh to create gateway-receiver?
> >>
> >> On Tue, Sep 4, 2018 at 4:03 PM Nabarun Nag  wrote:
> >>
> >> > Currently we have a minor issue in the release branch as pointed out
> by
> >> > Barry O.
> >> > We will wait till a resolution is figured out for this issue.
> >> >
> >> > Steps:
> >> > 1. create locator
> >> > 2. start server --name=server1 --server-port=40404
> >> > 3. start server --name=server2 --server-port=40405
> >> > 4. create gateway-receiver --member=server1
> >> > 5. create gateway-receiver --member=server2 `This gets stuck for 2
> >> minutes`
> >> >
> >> > Is the 2 minute wait time acceptable? Should we document it? When we
> >> revert
> >> > GEODE-5591, this issue does not happen.
> >> >
> >> > Regards
> >> > Nabarun Nag
> >> >
> >>
> >
>


Re: 2 minute gateway startup time due to GEODE-5591

2018-09-04 Thread Sean Goller
It affects us on any linux platform that doesn't use glibc. It's not worth
holding up the release for. It's been this way for 20 years, right? ;)

Revert it.

On Tue, Sep 4, 2018 at 5:09 PM Udo Kohlmeyer  wrote:

> Imo (and I'm coming in cold)... We are NOT officially supporting Alpine
> linux (yet), which is the basis for this ticket, maybe push this to a
> later release?
>
> I prefer us getting out the fixes we have and release a more optimal
> version of GEODE-5591 later.
>
> IF this is a bug that will affect us on EVERY linux distro, then we
> should fix, otherwise, I vote to push it to 1.8
>
> --Udo
>
>
> On 9/4/18 16:38, Dan Smith wrote:
> > Spitting this into a separate thread.
> >
> > I see the issue. The two minute timeout is the constructor for
> > AcceptorImpl, where it retries to bind for 2 minutes.
> >
> > That behavior makes sense for CacheServer.start.
> >
> > But it doesn't make sense for the new logic in GatewayReceiver.start()
> from
> > GEODE-5591. That code is trying to use CacheServer.start to scan for an
> > available port, trying each port in a range. That free port finding logic
> > really doesn't want to have two minutes of retries for each port. It
> seems
> > like we need to rework the fix for GEODE-5591.
> >
> > Does it make sense to hold up the release to rework this fix, or should
> we
> > just revert it? Have we switched concourse over to using alpine linux,
> > which I think was the original motivation for this fix?
> >
> > -Dan
> >
> > On Tue, Sep 4, 2018 at 4:25 PM, Dan Smith  wrote:
> >
> >> Why is it waiting at all in this case? Where is this 2 minute timeout
> >> coming from?
> >>
> >> -Dan
> >>
> >> On Tue, Sep 4, 2018 at 4:12 PM, Sai Boorlagadda <
> sai.boorlaga...@gmail.com
> >>> wrote:
> >>> So the issue is that it takes longer to start than previous releases?
> >>> Also, is this wait time only when using Gfsh to create
> gateway-receiver?
> >>>
> >>> On Tue, Sep 4, 2018 at 4:03 PM Nabarun Nag  wrote:
> >>>
>  Currently we have a minor issue in the release branch as pointed out
> by
>  Barry O.
>  We will wait till a resolution is figured out for this issue.
> 
>  Steps:
>  1. create locator
>  2. start server --name=server1 --server-port=40404
>  3. start server --name=server2 --server-port=40405
>  4. create gateway-receiver --member=server1
>  5. create gateway-receiver --member=server2 `This gets stuck for 2
> >>> minutes`
>  Is the 2 minute wait time acceptable? Should we document it? When we
> >>> revert
>  GEODE-5591, this issue does not happen.
> 
>  Regards
>  Nabarun Nag
> 
>
>


Concourse instability

2018-07-05 Thread Sean Goller
We're experiencing a minor amount of instability relating to filled up
disks on workers, along with massive job execution. We're recreating
workers so for the next hour or so things may not be working optimally.
Thank you for your patience.


-Sean.


Re: Concourse instability

2018-07-05 Thread Sean Goller
I think we're firing on all cylinders again. I'll be monitoring it for the
rest of the day for issues.

On Thu, Jul 5, 2018 at 11:39 AM Sean Goller  wrote:

> We're experiencing a minor amount of instability relating to filled up
> disks on workers, along with massive job execution. We're recreating
> workers so for the next hour or so things may not be working optimally.
> Thank you for your patience.
>
>
> -Sean.
>
>


concourse excitement

2018-07-06 Thread Sean Goller
We're having some unintended instability in the geode concourse at the
moment, the concourse atc vm is being recreated so some DNS may need to be
updated once everything settles down. For the moment however, geode
concourse is going to be inaccessible for a bit.

My apologies, it did not seem like the work we were doing would impact
concourse in this way.

-Sean.


Re: [PROPOSAL] use default value for validate-serializable-objects in dunit

2018-03-15 Thread Sean Goller
I agree with this. We should have a default state that reflects an “out of
the box” configuration, and if tests expects a different configuration, it
should manage that within the context of the test.

-Sean

On Tue, Mar 13, 2018 at 10:04 AM Kirk Lund  wrote:

> I want to propose using the default value for validate-serializable-object
> in dunit tests instead of forcing it on for all dunit tests. I'm
> sympathetic to the reason why this was done: ensure that all existing code
> and future code will function properly with this feature enabled.
> Unfortunately running all dunit tests with it turned on is not a good way
> to achieve this.
>
> Here are my reasons:
>
> 1) All tests should start out with the same defaults that Users have. If we
> don't do this, we are going to goof up sometime and release something that
> only works with this feature turned on or worsen the User experience of
> Geode in some way.
>
> 2) All tests should have sovereign control over their own configuration. We
> should strive towards being able to look at a test and determine its config
> at a glance without having to dig through the framework or other classes
> for hidden configuration. We continue to improve dunit in this area but I
> think adding to the problem is going in the wrong direction.
>
> 3) It sets a bad precedent. Do we follow this approach once or keep adding
> additional non-default features when we need to test them too? Next one is
> GEODE-4769 "Serialize region entry before putting in local cache" which
> will be disabled by default in the next Geode release and yet by turning it
> on by default for all of precheckin we were able to find lots of problems
> in both the product code and test code.
>
> 4) This is already starting to cause confusion for developers thinking its
> actually a product default or expecting it to be enabled in other
> (non-dunit) tests.
>
> Alternatives for test coverage:
>
> There really are no reasonable shortcuts for end-to-end test coverage of
> any feature. We need to write new tests or identify existing tests to
> subclass with the feature enabled.
>
> 1) Subclass specific tests to turn validate-serializable-object on for that
> test case. Examples of this include a) dunit tests that execute Region
> tests with OffHeap enabled, b) dunit tests that flip on HTTP over GFSH, c)
> dunit tests that run with SSL or additional security enabled.
>
> 2) Write new tests that cover all features with
> validate-serializable-object
> enabled.
>
> 3) Rerun all of dunit with and without the option. This doesn't sound very
> reasonable to me, but it's the closest to what we really want or need.
>
> Any other ideas or suggestions other than forcing all dunit tests to run
> with a non-default value?
>
> Thanks,
> Kirk
>


Re: Geode 1.6.0 release branch has been created

2018-04-13 Thread Sean Goller
The corresponding pipeline has been created:
https://concourse.apachegeode-ci.info/teams/main/pipelines/release-1.6.0

On Thu, Apr 12, 2018 at 3:03 PM, Michael Stolz  wrote:

> Can someone please create a concourse pipeline for this release?
>
>
> --
> Mike Stolz
>


Re: [DISCUSS]: Tests requiring external services

2018-04-03 Thread Sean Goller
I'm actually fine with putting it in AcceptanceTest for now.

Ideally I'd like to see something like JDBC connection strings that could
be passed in as properties via the command-line, and if they're not present
the relevant tests don't get run. That way the entity running the tests can
decide the best way to enable those tests.

On Tue, Apr 3, 2018 at 4:11 PM, Jens Deppe  wrote:

> I'm in favor of using docker for test isolation. We already have an
> 'AcceptanceTest' category which you might consider using instead of
> creating yet another category.
>
> --Jens
>
> On Tue, Apr 3, 2018 at 2:02 PM, Nick Reich  wrote:
>
> > Team,
> >
> > As part of validating the new JDBC connector for Geode, we have a need
> for
> > tests that involving connecting to specific databases (like MySQL and
> > Postgres) to validate proper function with those databases. Since these
> > tests require connecting to outside services, unlike existing Geode
> tests,
> > we are seeking suggestions on how to best incorporate such tests into
> > Geode. The approach we have taken so far is to utilize Docker (and Docker
> > Compose) to take care of spinning up our external services for the
> duration
> > of the tests. This, however requires that Docker and Docker Compose be
> > installed on any machine that the tests are run on. Additionally, the
> > Concourse pipeline for validating develop is incompatible with use of
> > Docker for distributed tests, due to the way they are already being run
> > within Docker containers of their own (it seems possible to make it work,
> > but would add overhead to all tests and would be a challenge to
> implement).
> >
> > To address these issues, we are considering having these tests run under
> a
> > new task, such as "serviceTest" (instead of IntegrationTest or
> > distributedTest). That way, developers could run all other tests normally
> > on their machines, only requiring Docker and Docker Compose if they wish
> to
> > run these specific tests. This would also allow them to be their own task
> > in Concourse, eliminating the issues that plague integrating these tests
> > there.
> >
> > Are there other ideas on how to manage these tests or concerns with the
> > proposed approach?
> >
>


Re: [DISCUSS] New List for Commit and CI Emails

2018-03-21 Thread Sean Goller
Concourse sends mail whenever a job fails.

On Wed, Mar 21, 2018 at 9:49 AM, Swapnil Bawaskar 
wrote:

> I know travis is already configured to send emails only when the build
> breaks and then when it is fixed. Is concourse configured the same?
>
> On Wed, Mar 21, 2018 at 9:38 AM Patrick Rhomberg 
> wrote:
>
> > I'm with Swapnil on this one.  I think the way we make it less noisy is
> to
> > take the time to fix the failing tests.
> >
> > I suppose we could split the difference and give the CI emails a, say,
> > daily cadence.  No news is good news, or else it gives you all the
> failures
> > in the last 24 hours.  Don't know how easy that would be to cache and
> > report under the existing framework, though.
> >
> > On Wed, Mar 21, 2018 at 12:05 AM, Jacob Barrett 
> > wrote:
> >
> > > It’s sad that the most frequent spammer... e... I mean mailer is
> the
> > > new CI process. If we aren’t going to send it elsewhere how can we make
> > it
> > > less noisy?
> > >
> > > -Jake
> > >
> > >
> > > > On Mar 20, 2018, at 8:37 PM, Dan Smith  wrote:
> > > >
> > > > I was curious about the stats for bot vs. humans on the dev list. Out
> > of
> > > > 915 messages, looks like we're about 50% robot.
> > > >
> > > > I'm still be in favor of not sending these messages to dev@geode.
> Long
> > > time
> > > > members have probably already created a mail filter by now (I know I
> > > have)
> > > > so we're only hurting newbies by sending a bunch of messages.
> > > >
> > > > 1) apac...@gmail.com 241
> > > > 2) Spring CI 109
> > > > 3) Kirk Lund 63
> > > > 4) Apache Jenkins Server 51
> > > > 5) Anthony Baker 41
> > > > 6) Dan Smith 40
> > > > 7) Travis CI 38
> > >
> >
>


Re: Next release: 1.5.0

2018-03-05 Thread Sean Goller
1.5.0 pipeline is up and running, please take a look at it and let the list
know if there are problems.

https://concourse.apachegeode-ci.info/teams/main/pipelines/release-1.5.0

On Mon, Mar 5, 2018 at 11:07 AM, Anthony Baker  wrote:

> LGTM
>
> > On Mar 2, 2018, at 4:05 PM, Swapnil Bawaskar 
> wrote:
> >
> > Thanks Dave!
> >
> > All, I have created a release branch (
> > https://github.com/apache/geode/tree/release/1.5.0) Please review.
> >
> > On Fri, Mar 2, 2018 at 9:56 AM Dave Barnes  wrote:
> >
> >> Status on the 3 doc issues:
> >> GEODE-4737 / GEODE-3915: JSON args in gfsh - Done
> >> GEODE-4101:  redirect-output - Done
> >> GEODE-3948: client timeout - Done
> >>
> >>
> >> On Thu, Mar 1, 2018 at 11:22 AM, Swapnil Bawaskar  >
> >> wrote:
> >>
> >>> I will take up the release management task for 1.5.0
> >>>
> >>> On Mon, Feb 26, 2018 at 5:03 PM Dave Barnes 
> wrote:
> >>>
>  Status on the 3 doc issues:
>  GEODE-4737 / GEODE-3915: JSON args in gfsh - Karen's got it covered
>  GEODE-4101:  redirect-output  - Dave, in process, on track
>  GEODE-3948: client timeout - Dave, in process. Probably on track -
> will
>  keep you posted
> 
>  On Mon, Feb 26, 2018 at 11:07 AM, Anthony Baker 
> >>> wrote:
> 
> > Just checking in as we’re approaching the end of February.  We’ve
>  finished
> > around 200 issues and enhancements with 3 documentation updates open
> >>> [1].
> > Is this a good time for another release?
> >
> > Any takers to do the release management tasks for 1.5.0?
> >
> > Anthony
> >
> > [1] https://issues.apache.org/jira/projects/GEODE/versions/12342395
> >> <
> > https://issues.apache.org/jira/projects/GEODE/versions/12342395>
> >
> >
> >> On Feb 7, 2018, at 1:56 PM, Anthony Baker 
> >> wrote:
> >>
> >> Hi all,
> >>
> >> Nice work on getting the 1.4.0 release out the door!  Next up is
> >>> 1.5.0.
> > Any one want to volunteer for release manager?  If you haven’t done
> >>> this
> > before and would like to try, please review [1].
> >>
> >> I’ve been advocating for more frequent releases.  I’d love see a
> >>> March
> > release—which means we would need to be ready to cut the release
> >> branch
>  in
> > early March.
> >>
> >> Thoughts?
> >>
> >> Anthony
> >>
> >> [1]
>  https://cwiki.apache.org/confluence/display/GEODE/Release+Steps?src=
> > contextnavpagetreemode <
>  https://cwiki.apache.org/confluence/display/GEODE/
> > Release+Steps?src=contextnavpagetreemode>
> >>
> >
> >
> 
> >>>
> >>
>
>


Re: Concourse upgrade

2018-10-11 Thread Sean Goller
This should be resolved at this point. If anyone sees any weirdness now,
please inform the list.

On Wed, Oct 10, 2018 at 1:03 PM Bruce Schuchardt 
wrote:

> Concourse runs have started to fail
>
> ERROR: JAVA_HOME is set to an invalid directory:
> /usr/lib/jvm/java--openjdk-amd64
> Please set the JAVA_HOME variable in your environment to match the
> location of your Java installation.
>
>
> On 10/10/18 11:03 AM, Sean Goller wrote:
> > We're going to be upgrading the concourse infrastructure to the latest
> > version today, since 1.7 has been released. Best case it will be fairly
> > seamless and everything will be unicorns and puppies and happiness. Worst
> > case it'll be messed up for a few days.
> >
> > -Sean.
> >
>
>


Pipeline names changing shortly

2018-10-15 Thread Sean Goller
In order to standardize CI deployment and to improve quality of life around
running pipelines for forks, we've merged a number of changes to the ci/
directory which will create new pipelines for the develop branch. If you
have direct links to specific pipelines (specifically develop ones) those
links will eventually break. Once we've deployed the new set of pipelines,
(real soon now) we will pause the legacy pipelines and leave them up for a
week or so.

-Sean.


Re: Pipeline names changing shortly

2018-10-15 Thread Sean Goller
New pipelines for develop (apache-develop-main) and PR (apache-develop-pr)
are up. We'll bring up the rest of the pipelines (metrics, examples, etc.)
tomorrow.

On Mon, Oct 15, 2018 at 3:03 PM Sean Goller  wrote:

> In order to standardize CI deployment and to improve quality of life
> around running pipelines for forks, we've merged a number of changes to the
> ci/ directory which will create new pipelines for the develop branch. If
> you have direct links to specific pipelines (specifically develop ones)
> those links will eventually break. Once we've deployed the new set of
> pipelines, (real soon now) we will pause the legacy pipelines and leave
> them up for a week or so.
>
> -Sean.
>


concourse problems

2018-10-24 Thread Sean Goller
The web front end for the geode concourse is currently experiencing some
issues. we are actively troubleshooting this.

-Sean.


Re: concourse problems

2018-10-24 Thread Sean Goller
This should be stable again.

-S.

On Wed, Oct 24, 2018 at 9:36 AM Sean Goller  wrote:

> The web front end for the geode concourse is currently experiencing some
> issues. we are actively troubleshooting this.
>
> -Sean.
>


Concourse upgrade

2018-10-10 Thread Sean Goller
We're going to be upgrading the concourse infrastructure to the latest
version today, since 1.7 has been released. Best case it will be fairly
seamless and everything will be unicorns and puppies and happiness. Worst
case it'll be messed up for a few days.

-Sean.


concourse environment recreation

2018-09-20 Thread Sean Goller
Unfortunately due to some fat fingering on my part, the concourse
deployment was destroyed. We are in the process of recreating it. The
impact of this is that build logs/history is gone, but artifacts are still
present. Any jobs that were running at the time are gone. Pipelines will
need to be redeployed. ETA end of day.

-Sean.


Re: Failing to get resources from Maven in CI

2018-12-07 Thread Sean Goller
I agree. We should deploy a caching server to resolve this.

On Thu, Dec 6, 2018 at 4:16 PM Jacob Barrett  wrote:

> Given that this seems to be an ongoing problem with bintray I think it is
> time we put a local caching repository in our CI network. The native
> project ran into this same issue for months at one point. Depending on
> where in the internet you live the bintray host you get can be flakey. For
> native we would get a glacially slow server from a connections on a certain
> ISP. If we put Artifactory or something on a GCP instance and proxy all
> Maven dependencies through it we could cut down these issues significantly.
> If the artifact is already cached, nothing to do, but it needs to fetch
> upstream it is really good about trying multiple sources and retrying on
> transient failures.
>
> Objections? Thoughts?
>
> -Jake
>
>
> > On Dec 6, 2018, at 4:01 PM, Helena Bales  wrote:
> >
> > Hello,
> >
> > It looks like we are still having some problems with Maven in CI. I got
> the
> > following error in the pre-checkin for this PR
> > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_geode_pull_2964=DwIBaQ=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw=dUZ47GLLeizzE2hSUiriIoTlIpvoqfbYDvZD45P2srM=W_6FJHVE85xyXpDMaO9yVyW-eRFIPNzpQTJjDm8KTlI=GjQzRsChANuUYXEnFc9LovzzIEr_s5XFmFPSU5-dQ8I=>
> (the first commit). Since I
> > didn't change any dependencies, I would expect the image to already have
> > all the dependencies, and since we increased the timeout for the Maven
> > connection, I would expect us to be able to connect to Maven. This issue
> is
> > coming up frequently enough that I think we should do something about it.
> >
> > What can we do to fix this?
> >
> > Part of the error message:
> >
> >> Could not resolve com.netflix.nebula:nebula-gradle-interop:0.6.0.
> > Required by:
> > project : > com.netflix.nebula:nebula-project-plugin:5.1.0
> > project : > com.netflix.nebula:nebula-project-plugin:5.1.0 >
> > com.netflix.nebula:gradle-dependency-lock-plugin:6.0.0
> >> Could not resolve com.netflix.nebula:nebula-gradle-interop:0.6.0.
> >> Could not get resource
> > '
> https://urldefense.proofpoint.com/v2/url?u=https-3A__plugins.gradle.org_m2_com_netflix_nebula_nebula-2Dgradle-2Dinterop_0.6.0_nebula-2Dgradle-2Dinterop-2D0.6.0.pom=DwIBaQ=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw=dUZ47GLLeizzE2hSUiriIoTlIpvoqfbYDvZD45P2srM=W_6FJHVE85xyXpDMaO9yVyW-eRFIPNzpQTJjDm8KTlI=zF0AKeB_4ff6Z6vyIaQRZcISf6bZy9bVJUoYauZci-0=
> '.
> >> Could not GET
> > '
> https://urldefense.proofpoint.com/v2/url?u=https-3A__plugins.gradle.org_m2_com_netflix_nebula_nebula-2Dgradle-2Dinterop_0.6.0_nebula-2Dgradle-2Dinterop-2D0.6.0.pom=DwIBaQ=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw=dUZ47GLLeizzE2hSUiriIoTlIpvoqfbYDvZD45P2srM=W_6FJHVE85xyXpDMaO9yVyW-eRFIPNzpQTJjDm8KTlI=zF0AKeB_4ff6Z6vyIaQRZcISf6bZy9bVJUoYauZci-0=
> '.
> >> Connect to
> https://urldefense.proofpoint.com/v2/url?u=http-3A__jcenter.bintray.com=DwIBaQ=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw=dUZ47GLLeizzE2hSUiriIoTlIpvoqfbYDvZD45P2srM=W_6FJHVE85xyXpDMaO9yVyW-eRFIPNzpQTJjDm8KTlI=F2LvC1coGwut-9sy2cbmLeWeDafmNRiUYL458xaR_Q0=:443
> > [
> https://urldefense.proofpoint.com/v2/url?u=http-3A__jcenter.bintray.com_75.126.118.188=DwIBaQ=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw=dUZ47GLLeizzE2hSUiriIoTlIpvoqfbYDvZD45P2srM=W_6FJHVE85xyXpDMaO9yVyW-eRFIPNzpQTJjDm8KTlI=SNOKq9weHDmT2RyQzsjfOo_Jc2MCr9vP4NuGoHOp8I8=]
> failed: connect timed out
>
>


JIRA write access?

2018-11-26 Thread Sean Goller
Hi, In order to start dealing with some JIRA tickets could I get assign and
resolve access?
My apache username is smgoller.

Thanks!

-Sean.


Concourse upgrade

2019-10-02 Thread Sean Goller
We will be upgrading the concourse infrastructure to version 5.5.1 in the
next few hours. Please be aware there will be turbulence.

-Sean.


Re: Flaky test caused by missing JDK dependency

2020-07-08 Thread Sean Goller
The Liberica JDK does not include the Attach API. I'm investigating why. Given 
the inherent insecurity of this API, I recommend we transition away from using 
it.

From: Kirk Lund 
Sent: Monday, July 6, 2020 10:36 AM
To: dev@geode.apache.org 
Subject: Flaky test caused by missing JDK dependency

CI Failure:
LocatorLauncherRemoteFileIntegrationTest.startDeletesStaleControlFiles
failed with ConditionTimeoutException
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-6183data=02%7C01%7Csgoller%40vmware.com%7Cbcfa660e6577442247ee08d821d3283c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637296538083290882sdata=7HMMMxvHvdlJf8Awc9zAl7Xr9291Ep0HMoto5%2F%2BgUys%3Dreserved=0

I've debugged the latest occurrence of GEODE-6183 (intermittent failure CI):
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fconcourse.apachegeode-ci.info%2Fteams%2Fmain%2Fpipelines%2Fapache-support-1-13-main%2Fjobs%2FWindowsCoreIntegrationTestOpenJDK11%2Fbuilds%2F34data=02%7C01%7Csgoller%40vmware.com%7Cbcfa660e6577442247ee08d821d3283c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637296538083290882sdata=x1FIsAQOyZd%2Bvl%2BRRkzI8yGeFW%2B04sMuO4JM%2F67vu8w%3Dreserved=0

The underlying cause is a missing dependency: the Attach API. In the Oracle
JDK, the Attach API is found in the JAVA_HOME/lib/tools.jar. In some JDKs,
including LibericaJDK, there may not be a tools.jar or it may be missing
from our image of specific JDKs or a specific OS. I confirmed that the
Attach API is actually inside a different .jar on some Mac releases of the
JDK.

Other than JDK differences, I'm not sure why tools.jar would intermittently
be missing from our testing image, but that's definitely the cause of
WindowsCoreIntegrationTestOpenJDK11 failing. I've reviewed a couple other
older runs and it was the same intermittent cause of failure.

Does anyone know if LibericaJDK includes tools.jar or the Attach API?

Does anyone know how to verify that our images all have tools.jar or its
equivalent?

java.util.ServiceConfigurationError:
com.sun.tools.attach.spi.AttachProvider: Provider
sun.tools.attach.WindowsAttachProvider not found
at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:588)
at
java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.nextProviderClass(ServiceLoader.java:1211)
at
java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNextService(ServiceLoader.java:1220)
at
java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNext(ServiceLoader.java:1264)
at java.base/java.util.ServiceLoader$2.hasNext(ServiceLoader.java:1299)
at java.base/java.util.ServiceLoader$3.hasNext(ServiceLoader.java:1384)
at
jdk.attach/com.sun.tools.attach.spi.AttachProvider.providers(AttachProvider.java:258)
at
jdk.attach/com.sun.tools.attach.VirtualMachine.list(VirtualMachine.java:144)
at
org.apache.geode.internal.process.AttachProcessUtils.isProcessAlive(AttachProcessUtils.java:35)
at
org.apache.geode.internal.process.ProcessUtils.isProcessAlive(ProcessUtils.java:99)
at
org.apache.geode.internal.process.lang.AvailablePid.findAvailablePid(AvailablePid.java:117)
at
org.apache.geode.distributed.LauncherIntegrationTestCase.setUpAbstractLauncherIntegrationTestCase(LauncherIntegrationTestCase.java:92)


Re: [VOTE] Apache Geode 1.12.3.RC3

2021-06-30 Thread Sean Goller
+1

Looks good from what I can see.

On Fri, Jun 25, 2021 at 1:15 PM Owen Nichols  wrote:

> Hello Geode Dev Community,
>
> I'd like to propose a 1.12 patch release.
>
> This is a release candidate for Apache Geode version 1.12.3.RC3.
> Note: This is the first vote email due to a build issue with the first two
> RCs.
> Thanks to all the community members for their contributions to this
> release!
>
> Please do a review and give your feedback, including the checks you
> performed.
>
> Voting deadline:
> 3PM PDT Wed, June 30 2021.
>
> Please note that we are voting upon the source tag:
> rel/v1.12.3.RC3
>
> Release notes:
>
> https://cwiki.apache.org/confluence/display/GEODE/Release+Notes#ReleaseNotes-1.12.3
>
> Source and binary distributions:
> https://dist.apache.org/repos/dist/dev/geode/1.12.3.RC3/
>
> Maven staging repo:
> https://repository.apache.org/content/repositories/orgapachegeode-1084
>
> GitHub:
> https://github.com/apache/geode/tree/rel/v1.12.3.RC3
> https://github.com/apache/geode-examples/tree/rel/v1.12.3.RC3
> https://github.com/apache/geode-native/tree/rel/v1.12.3.RC3
> https://github.com/apache/geode-benchmarks/tree/rel/v1.12.3.RC3
>
> Pipelines:
>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-support-1-12-main
>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-support-1-12-rc
>
> Geode's KEYS file containing PGP keys we use to sign the release:
> https://github.com/apache/geode/blob/develop/KEYS
>
> Command to run geode-examples:
> ./gradlew -PgeodeReleaseUrl=
> https://dist.apache.org/repos/dist/dev/geode/1.12.3.RC3
> -PgeodeRepositoryUrl=
> https://repository.apache.org/content/repositories/orgapachegeode-1084
> build runAll
>
> Regards
> Owen Nichols
>


Re: Need Access to apachegeode/geode-native-build on docker hub

2021-02-22 Thread Sean Goller
Travis can pull authenticated docker images so GCR is usable. I'm not
thrilled about a docker image that gets pushed to by a release process
having a 'latest' tag that is not the same as the latest release artifact.
We have a few options here:

1) Update travis to access GCR in an authenticated fashion. This leaves the
dockerhub image alone.
2) Periodically generate a snapshot  native-build image, push it to tag
"snapshot" on dockerhub and don't update latest. This preserves
functionality and doesn't require travis to possess any credentials.
3) Just overwrite latest at will.  This implies that we don't actually care
what latest is from a release standpoint so why are we pushing to it as
part of a release at all?

-Sean.



On Mon, Feb 22, 2021 at 1:41 PM Jacob Barrett  wrote:

> The original request was for access to update the docker image we have
> been using for PRs and such. If you all want to discussion on doing
> something different with its usage or where it is stored please start a new
> thread.
>
> Mike, your image has been updated and all should be happy now. If you
> still want access to do this yourself I don’t have the magic to make that
> happen.
>
> -Jake
>
>
> On Feb 22, 2021, at 1:28 PM, Owen Nichols  onich...@vmware.com>> wrote:
>
> I don't believe LGTM uses it at all, but I do see that Travis is
> configured to use "latest" tag.
>
> "latest" tag on apachegeode dockerhub is assumed to be a synonym for
> "latest released version" (currently 1.13.1)
>
> If apachegeode dockerhub makes the most sense for hosting CI images, I
> wonder if it could be as simple as using a different tag-naming scheme for
> images that do not correspond to releases, maybe "latest-CI" or
> "1.14.0-dev" or something?
>
> On 2/22/21, 12:12 PM, "Jacob Barrett"  jabarr...@vmware.com>> wrote:
>
>Wherever we want to put it, if not dockerhub anymore, it must be
> accessible by Travis and LGTM or we also stop using Travis and LGTM.
>
> On Feb 22, 2021, at 9:06 AM, Anthony Baker  bak...@vmware.com>> wrote:
>
> Where are we hosting images for the geode (not the geode-native)
> pipeline?  While it might make sense to use the same repo across the board,
> the images for geode-native and geode-site can be useful beyond just CI.  I
> know I use the geode-native-build image to vet releases.
>
> Anthony
>
>
> On Feb 22, 2021, at 8:12 AM, Jacob Barrett  jabarr...@vmware.com>> wrote:
>
> This docker image has been there for years and is not related to a release
> but to the LGTM and Travis builds. I have access and can bump the image for
> you Mike.
>
> -Jake
>
> On Feb 17, 2021, at 1:35 PM, Owen Nichols  onich...@vmware.com>> wrote:
>
> apachegeode images on dockerhub correspond to releases. To make a change,
> the change must first come to appropriate release branch, then a Geode
> release made.
>
> Perhaps trying to use release images for CI is not quite the right
> strategy?
>
>
> ---
> Sent from Workspace ONE Boxer<
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwhatisworkspaceone.com%2Fboxerdata=04%7C01%7Cjabarrett%40vmware.com%7C3bc55f2d90fd4af5716b08d8d778ca5f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637496261089330277%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=qDuATku1EGJLtaZ%2BBcIgoSDwdeaAukL68sSY4A1Sjg8%3Dreserved=0
> >
>
> On February 17, 2021 at 1:24:04 PM PST, Mike Martell  > wrote:
> Please provide access to geode-native images on docker hub. Need to update
> our latest docker image to support our new open source CI for geode-native.
> In particular, need to upgrade to use cmake 3.18.
>
> Thanks,
> Mike
>
>


Re: Apache Geode CI upgrade

2021-11-11 Thread Sean Goller
To be clear, you can't see any pipelines at all? or you just can't interact
with them?

On Thu, Nov 11, 2021 at 2:35 AM Mario Salazar de Torres
 wrote:

> Hi everyone,
>
> After CI update, it seems that I can't see any geode/geode-native
> pipelines. Not even after authenticating with GitHub.
> So, is there anything we might need to do to have access to the pipelines
> after this update?
> Or do you need to grant us access? If so, my GH username is
> gaussianrecurrence.
>
> Thanks!
> Mario
> ____
> From: Sean Goller 
> Sent: Wednesday, November 10, 2021 8:05 PM
> To: dev@geode.apache.org 
> Subject: Re: Apache Geode CI upgrade
>
> The Apache Geode Concourse deployment upgrade is now complete. You should
> clear the DNS cache on your browser, or restart it completely in order to
> pick up the DNS changes.
>
> -Sean.
>
> On Wed, Nov 10, 2021 at 9:01 AM Robert Houghton 
> wrote:
>
> > We are upgrading the Apache Geode Concourse deployment. Pipelines will be
> > paused as we roll over to the new backend.
> >
> > Thanks you,
> > -Robert Houghton
> >
> > From: Robert Houghton 
> > Date: Thursday, November 4, 2021 at 3:24 PM
> > To: geode 
> > Subject: [Suspected Spam] Apache Geode CI upgrade
> > Hello Geode Developers,
> >
> > Next week, Wednesday, 10 November 2021, we are planning to upgrade the
> > Apache Geode Concourse deployment from version 6.7.4 to 7.6.0. We will
> also
> > be moving the backend for Concourse from Bosh to k8s. We will deploy all
> > production pipelines (Apache repo, develop and support/* branches).
> > Developer or personal pipelines will not be deployed. We are not able to
> > bring job history along.
> >
> > What this means to you:
> > Please anticipate some downtime in the develop pipelines, but PR checks
> > should continue to run as normal. Links from GitHub PRs to specific
> > Concourse jobs in the old deployment will break. If this happens to you,
> > change the domain from
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fconcourse.apachegeode-ci.info%2Fdata=04%7C01%7Crhoughton%40vmware.com%7C06f5aeffa9714b6815df08d99fe1e9c0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637716614930386785%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=oQkhAIq%2FkBTgh6SO6SIkOxBSKL3TlaUResjO1i84ESM%3Dreserved=0
> > to
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fconcourse-calcite.apachegeode-ci.info%2Fdata=04%7C01%7Crhoughton%40vmware.com%7C06f5aeffa9714b6815df08d99fe1e9c0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637716614930396731%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=tgajkCnfpMZbiFn0dQWrZYyvctRWyQfSU6YqGD8vBew%3Dreserved=0
> > to reach the broken link. The team is brainstorming on how to not break
> > these links in the future.
> >
> >
> > Thank you,
> > -Robert Houghton
> >
>


Re: Apache Geode CI upgrade

2021-11-10 Thread Sean Goller
The Apache Geode Concourse deployment upgrade is now complete. You should
clear the DNS cache on your browser, or restart it completely in order to
pick up the DNS changes.

-Sean.

On Wed, Nov 10, 2021 at 9:01 AM Robert Houghton 
wrote:

> We are upgrading the Apache Geode Concourse deployment. Pipelines will be
> paused as we roll over to the new backend.
>
> Thanks you,
> -Robert Houghton
>
> From: Robert Houghton 
> Date: Thursday, November 4, 2021 at 3:24 PM
> To: geode 
> Subject: [Suspected Spam] Apache Geode CI upgrade
> Hello Geode Developers,
>
> Next week, Wednesday, 10 November 2021, we are planning to upgrade the
> Apache Geode Concourse deployment from version 6.7.4 to 7.6.0. We will also
> be moving the backend for Concourse from Bosh to k8s. We will deploy all
> production pipelines (Apache repo, develop and support/* branches).
> Developer or personal pipelines will not be deployed. We are not able to
> bring job history along.
>
> What this means to you:
> Please anticipate some downtime in the develop pipelines, but PR checks
> should continue to run as normal. Links from GitHub PRs to specific
> Concourse jobs in the old deployment will break. If this happens to you,
> change the domain from
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fconcourse.apachegeode-ci.info%2Fdata=04%7C01%7Crhoughton%40vmware.com%7C06f5aeffa9714b6815df08d99fe1e9c0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637716614930386785%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=oQkhAIq%2FkBTgh6SO6SIkOxBSKL3TlaUResjO1i84ESM%3Dreserved=0
> to
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fconcourse-calcite.apachegeode-ci.info%2Fdata=04%7C01%7Crhoughton%40vmware.com%7C06f5aeffa9714b6815df08d99fe1e9c0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637716614930396731%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=tgajkCnfpMZbiFn0dQWrZYyvctRWyQfSU6YqGD8vBew%3Dreserved=0
> to reach the broken link. The team is brainstorming on how to not break
> these links in the future.
>
>
> Thank you,
> -Robert Houghton
>


Temporary Instability

2021-12-02 Thread Sean Goller
Due to some memory pressure we need to increase the size of some nodes in
the ci infrastructure. Due to this there may be some instability with
concourse, but we don't believe there will be any. This is just a heads up
in case something happens.

-Sean.


Re: Temporary Instability

2021-12-02 Thread Sean Goller
The upgrade is complete, so everything should be working normally.

On Thu, Dec 2, 2021 at 2:05 PM Sean Goller  wrote:

> Due to some memory pressure we need to increase the size of some nodes in
> the ci infrastructure. Due to this there may be some instability with
> concourse, but we don't believe there will be any. This is just a heads up
> in case something happens.
>
> -Sean.
>


Recent CI outage - resolved

2022-01-26 Thread Sean Goller
This morning we had an unexpected problem with Concourse. The database
volume filled up and locked everything up. The database volume has been
increased in size and everything should be functioning properly again. It's
highly likely that a few jobs failed because of this and we're looking into
it.

-Sean.