from:"Shane Knapp"

Re: jenkins maintenance/downtime, aug 28th, 730am-9am PDT

2014-08-28 Thread shane knapp

jenkins is now coming down.


On Thu, Aug 28, 2014 at 7:19 AM, shane knapp skn...@berkeley.edu wrote:

 reminder:  this is starting in 10 minutes


 On Wed, Aug 27, 2014 at 4:13 PM, shane knapp skn...@berkeley.edu wrote:

 tomorrow morning i will be upgrading jenkins to the latest/greatest
 (1.577).

 at 730am, i will put jenkins in to a quiet period, so no new builds will
 be accepted.  once any running builds are finished, i will be taking
 jenkins down for the upgrade.

 depending on what and how many jobs are running, i'm expecting this to
 take, at most, an hour.

 i'll send out an update tomorrow morning right before i begin, and will
 send out updates and an all-clear once we're up and running again.

 1.577 release notes:
 http://jenkins-ci.org/changelog

 please let me know if there are any questions/concerns.  thanks in
 advance!

 shane

Re: jenkins maintenance/downtime, aug 28th, 730am-9am PDT

2014-08-28 Thread shane knapp

jenkins is upgraded, but a few jobs sneaked in before i could do the plugin
updates.  i've put jenkins in quiet mode again, and once the spark builds
finish, i'll restart jenkins to enable the plugin updates and we'll be good
to go.

let's all take a moment to bask in the glory of the shiny new UI!  :)


On Thu, Aug 28, 2014 at 7:46 AM, shane knapp skn...@berkeley.edu wrote:

 jenkins is now coming down.


 On Thu, Aug 28, 2014 at 7:19 AM, shane knapp skn...@berkeley.edu wrote:

 reminder:  this is starting in 10 minutes


 On Wed, Aug 27, 2014 at 4:13 PM, shane knapp skn...@berkeley.edu wrote:

 tomorrow morning i will be upgrading jenkins to the latest/greatest
 (1.577).

 at 730am, i will put jenkins in to a quiet period, so no new builds will
 be accepted.  once any running builds are finished, i will be taking
 jenkins down for the upgrade.

 depending on what and how many jobs are running, i'm expecting this to
 take, at most, an hour.

 i'll send out an update tomorrow morning right before i begin, and will
 send out updates and an all-clear once we're up and running again.

 1.577 release notes:
 http://jenkins-ci.org/changelog

 please let me know if there are any questions/concerns.  thanks in
 advance!

 shane

Re: jenkins maintenance/downtime, aug 28th, 730am-9am PDT

2014-08-28 Thread shane knapp

this one job is blocking the jenkins restart:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19406/

i'm about to kill it so that i can get this done.  i'll restart the job
after jenkins is back up.


On Thu, Aug 28, 2014 at 7:51 AM, shane knapp skn...@berkeley.edu wrote:

 jenkins is upgraded, but a few jobs sneaked in before i could do the
 plugin updates.  i've put jenkins in quiet mode again, and once the spark
 builds finish, i'll restart jenkins to enable the plugin updates and we'll
 be good to go.

 let's all take a moment to bask in the glory of the shiny new UI!  :)


 On Thu, Aug 28, 2014 at 7:46 AM, shane knapp skn...@berkeley.edu wrote:

 jenkins is now coming down.


 On Thu, Aug 28, 2014 at 7:19 AM, shane knapp skn...@berkeley.edu wrote:

 reminder:  this is starting in 10 minutes


 On Wed, Aug 27, 2014 at 4:13 PM, shane knapp skn...@berkeley.edu
 wrote:

 tomorrow morning i will be upgrading jenkins to the latest/greatest
 (1.577).

 at 730am, i will put jenkins in to a quiet period, so no new builds
 will be accepted.  once any running builds are finished, i will be taking
 jenkins down for the upgrade.

 depending on what and how many jobs are running, i'm expecting this to
 take, at most, an hour.

 i'll send out an update tomorrow morning right before i begin, and will
 send out updates and an all-clear once we're up and running again.

 1.577 release notes:
 http://jenkins-ci.org/changelog

 please let me know if there are any questions/concerns.  thanks in
 advance!

 shane

Re: jenkins maintenance/downtime, aug 28th, 730am-9am PDT

2014-08-28 Thread shane knapp

all clear:  jenkins and all plugins have been updated!


On Thu, Aug 28, 2014 at 7:51 AM, shane knapp skn...@berkeley.edu wrote:

 jenkins is upgraded, but a few jobs sneaked in before i could do the
 plugin updates.  i've put jenkins in quiet mode again, and once the spark
 builds finish, i'll restart jenkins to enable the plugin updates and we'll
 be good to go.

 let's all take a moment to bask in the glory of the shiny new UI!  :)


 On Thu, Aug 28, 2014 at 7:46 AM, shane knapp skn...@berkeley.edu wrote:

 jenkins is now coming down.


 On Thu, Aug 28, 2014 at 7:19 AM, shane knapp skn...@berkeley.edu wrote:

 reminder:  this is starting in 10 minutes


 On Wed, Aug 27, 2014 at 4:13 PM, shane knapp skn...@berkeley.edu
 wrote:

 tomorrow morning i will be upgrading jenkins to the latest/greatest
 (1.577).

 at 730am, i will put jenkins in to a quiet period, so no new builds
 will be accepted.  once any running builds are finished, i will be taking
 jenkins down for the upgrade.

 depending on what and how many jobs are running, i'm expecting this to
 take, at most, an hour.

 i'll send out an update tomorrow morning right before i begin, and will
 send out updates and an all-clear once we're up and running again.

 1.577 release notes:
 http://jenkins-ci.org/changelog

 please let me know if there are any questions/concerns.  thanks in
 advance!

 shane

Re: jenkins maintenance/downtime, aug 28th, 730am-9am PDT

2014-08-28 Thread shane knapp

no problem!

also, i retriggered:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19406
it's currently:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19411


On Thu, Aug 28, 2014 at 9:46 AM, Reynold Xin r...@databricks.com wrote:

 Thanks for doing this, Shane.


 On Thursday, August 28, 2014, shane knapp skn...@berkeley.edu wrote:

 all clear:  jenkins and all plugins have been updated!


 On Thu, Aug 28, 2014 at 7:51 AM, shane knapp skn...@berkeley.edu wrote:

  jenkins is upgraded, but a few jobs sneaked in before i could do the
  plugin updates.  i've put jenkins in quiet mode again, and once the
 spark
  builds finish, i'll restart jenkins to enable the plugin updates and
 we'll
  be good to go.
 
  let's all take a moment to bask in the glory of the shiny new UI!  :)
 
 
  On Thu, Aug 28, 2014 at 7:46 AM, shane knapp skn...@berkeley.edu
 wrote:
 
  jenkins is now coming down.
 
 
  On Thu, Aug 28, 2014 at 7:19 AM, shane knapp skn...@berkeley.edu
 wrote:
 
  reminder:  this is starting in 10 minutes
 
 
  On Wed, Aug 27, 2014 at 4:13 PM, shane knapp skn...@berkeley.edu
  wrote:
 
  tomorrow morning i will be upgrading jenkins to the latest/greatest
  (1.577).
 
  at 730am, i will put jenkins in to a quiet period, so no new builds
  will be accepted.  once any running builds are finished, i will be
 taking
  jenkins down for the upgrade.
 
  depending on what and how many jobs are running, i'm expecting this
 to
  take, at most, an hour.
 
  i'll send out an update tomorrow morning right before i begin, and
 will
  send out updates and an all-clear once we're up and running again.
 
  1.577 release notes:
  http://jenkins-ci.org/changelog
 
  please let me know if there are any questions/concerns.  thanks in
  advance!
 
  shane
 
 
 
 
 

  --
 You received this message because you are subscribed to the Google Groups
 amp-infra group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to amp-infra+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

emergency jenkins restart, aug 29th, 730am-9am PDT -- plus a postmortem

2014-08-28 Thread shane knapp

as with all software upgrades, sometimes things don't always work as
expected.

a recent change to stapler[1], to verbosely
report NotExportableExceptions[2] is spamming our jenkins log file with
stack traces, which is growing rather quickly (1.2G since 9am).  this has
been reported to the jenkins jira[3], and a fix has been pushed and will be
rolled out soon[4].

this isn't affecting any builds, and jenkins is happily humming along.

in the interim, so that we don't run out of disk space, i will be
redirecting the jenkins logs tommorow morning to /dev/null for the long
weekend.

once a real fix has been released, i will update any packages needed and
redirect the logging back to the log file.

other than a short downtime, this will have no user-facing impact.

please let me know if you have any questions/concerns.

thanks for your patience!

shane the new guy  :)

[1] -- https://wiki.jenkins-ci.org/display/JENKINS/Architecture
[2] --
https://github.com/stapler/stapler/commit/ed2cb8b04c1514377f3a8bfbd567f050a67c6e1c
[3] --
https://issues.jenkins-ci.org/browse/JENKINS-24458?focusedCommentId=209247
[4] --
https://github.com/stapler/stapler/commit/e2b39098ca1f61a58970b8a41a3ae79053cf30e3

Re: emergency jenkins restart, aug 29th, 730am-9am PDT -- plus a postmortem

2014-08-29 Thread shane knapp

reminder:   this is happening right now.  jenkins is currently in quiet
mode, and in ~30 minutes, will be briefly going down.


On Thu, Aug 28, 2014 at 1:03 PM, shane knapp skn...@berkeley.edu wrote:

 as with all software upgrades, sometimes things don't always work as
 expected.

 a recent change to stapler[1], to verbosely
 report NotExportableExceptions[2] is spamming our jenkins log file with
 stack traces, which is growing rather quickly (1.2G since 9am).  this has
 been reported to the jenkins jira[3], and a fix has been pushed and will be
 rolled out soon[4].

 this isn't affecting any builds, and jenkins is happily humming along.

 in the interim, so that we don't run out of disk space, i will be
 redirecting the jenkins logs tommorow morning to /dev/null for the long
 weekend.

 once a real fix has been released, i will update any packages needed and
 redirect the logging back to the log file.

 other than a short downtime, this will have no user-facing impact.

 please let me know if you have any questions/concerns.

 thanks for your patience!

 shane the new guy  :)

 [1] -- https://wiki.jenkins-ci.org/display/JENKINS/Architecture
 [2] --
 https://github.com/stapler/stapler/commit/ed2cb8b04c1514377f3a8bfbd567f050a67c6e1c
 [3] --
 https://issues.jenkins-ci.org/browse/JENKINS-24458?focusedCommentId=209247
 [4] --
 https://github.com/stapler/stapler/commit/e2b39098ca1f61a58970b8a41a3ae79053cf30e3

Re: emergency jenkins restart, aug 29th, 730am-9am PDT -- plus a postmortem

2014-08-29 Thread shane knapp

this is done.


On Fri, Aug 29, 2014 at 7:32 AM, shane knapp skn...@berkeley.edu wrote:

 reminder:   this is happening right now.  jenkins is currently in quiet
 mode, and in ~30 minutes, will be briefly going down.


 On Thu, Aug 28, 2014 at 1:03 PM, shane knapp skn...@berkeley.edu wrote:

 as with all software upgrades, sometimes things don't always work as
 expected.

 a recent change to stapler[1], to verbosely
 report NotExportableExceptions[2] is spamming our jenkins log file with
 stack traces, which is growing rather quickly (1.2G since 9am).  this has
 been reported to the jenkins jira[3], and a fix has been pushed and will be
 rolled out soon[4].

 this isn't affecting any builds, and jenkins is happily humming along.

 in the interim, so that we don't run out of disk space, i will be
 redirecting the jenkins logs tommorow morning to /dev/null for the long
 weekend.

 once a real fix has been released, i will update any packages needed and
 redirect the logging back to the log file.

 other than a short downtime, this will have no user-facing impact.

 please let me know if you have any questions/concerns.

 thanks for your patience!

 shane the new guy  :)

 [1] -- https://wiki.jenkins-ci.org/display/JENKINS/Architecture
 [2] --
 https://github.com/stapler/stapler/commit/ed2cb8b04c1514377f3a8bfbd567f050a67c6e1c
 [3] --
 https://issues.jenkins-ci.org/browse/JENKINS-24458?focusedCommentId=209247
 [4] --
 https://github.com/stapler/stapler/commit/e2b39098ca1f61a58970b8a41a3ae79053cf30e3

new jenkins plugin installed and ready for use

2014-08-29 Thread shane knapp

i have always found the 'Rebuild' plugin super useful:
https://wiki.jenkins-ci.org/display/JENKINS/Rebuild+Plugin

this is installed and enables.  enjoy!

shane

hey spark developers! intro from shane knapp, devops engineer @ AMPLab

2014-09-02 Thread shane knapp

so, i had a meeting w/the databricks guys on friday and they recommended i
send an email out to the list to say 'hi' and give you guys a quick intro.
 :)

hi!  i'm shane knapp, the new AMPLab devops engineer, and will be spending
time getting the jenkins build infrastructure up to production quality.
 much of this will be 'under the covers' work, like better system level
auth, backups, etc, but some will definitely be user facing:  timely
jenkins updates, debugging broken build infrastructure and some plugin
support.

i've been working in the bay area now since 1997 at many different
companies, and my last 10 years has been split between google and palantir.
 i'm a huge proponent of OSS, and am really happy to be able to help with
the work you guys are doing!

if anyone has any requests/questions/comments, feel free to drop me a line!

shane

Re: quick jenkins restart

2014-09-02 Thread shane knapp

and we're back and building!


On Tue, Sep 2, 2014 at 5:07 PM, shane knapp skn...@berkeley.edu wrote:

 since our queue is really short, i'm waiting for a couple of builds to
 finish and will be restarting jenkins to install/update some plugins.  the
 github pull request builder looks like it has some fixes to reduce spammy
 github calls, and reduce any potential rate limiting.

 i'll let everyone know when it's back up...  this should be super quick
 (~15 mins for tests to finish, ~2 mins for jenkins to restart).

 thanks in advance!

 shane

amplab jenkins is down

2014-09-04 Thread shane knapp

i am trying to get things up and running, but it looks like either the
firewall gateway or jenkins server itself is down.  i'll update as soon as
i know more.

Re: amplab jenkins is down

2014-09-04 Thread shane knapp

looks like a power outage in soda hall.  more updates as they happen.


On Thu, Sep 4, 2014 at 12:25 PM, shane knapp skn...@berkeley.edu wrote:

 i am trying to get things up and running, but it looks like either the
 firewall gateway or jenkins server itself is down.  i'll update as soon as
 i know more.

Re: amplab jenkins is down

2014-09-04 Thread shane knapp

looks like some hardware failed, and we're swapping in a replacement.  i
don't have more specific information yet -- including *what* failed, as our
sysadmin is super busy ATM.  the root cause was an incorrect circuit being
switched off during building maintenance.

on a side note, this incident will be accelerating our plan to move the
entire jenkins infrastructure in to a managed datacenter environment.  this
will be our major push over the next couple of weeks.  more details about
this, also, as soon as i get them.

i'm very sorry about the downtime, we'll get everything up and running ASAP.


On Thu, Sep 4, 2014 at 12:27 PM, shane knapp skn...@berkeley.edu wrote:

 looks like a power outage in soda hall.  more updates as they happen.


 On Thu, Sep 4, 2014 at 12:25 PM, shane knapp skn...@berkeley.edu wrote:

 i am trying to get things up and running, but it looks like either the
 firewall gateway or jenkins server itself is down.  i'll update as soon as
 i know more.

Re: amplab jenkins is down

2014-09-04 Thread shane knapp

it's a faulty power switch on the firewall, which has been swapped out.
 we're about to reboot and be good to go.


On Thu, Sep 4, 2014 at 1:19 PM, shane knapp skn...@berkeley.edu wrote:

 looks like some hardware failed, and we're swapping in a replacement.  i
 don't have more specific information yet -- including *what* failed, as our
 sysadmin is super busy ATM.  the root cause was an incorrect circuit being
 switched off during building maintenance.

 on a side note, this incident will be accelerating our plan to move the
 entire jenkins infrastructure in to a managed datacenter environment.  this
 will be our major push over the next couple of weeks.  more details about
 this, also, as soon as i get them.

 i'm very sorry about the downtime, we'll get everything up and running
 ASAP.


 On Thu, Sep 4, 2014 at 12:27 PM, shane knapp skn...@berkeley.edu wrote:

 looks like a power outage in soda hall.  more updates as they happen.


 On Thu, Sep 4, 2014 at 12:25 PM, shane knapp skn...@berkeley.edu wrote:

 i am trying to get things up and running, but it looks like either the
 firewall gateway or jenkins server itself is down.  i'll update as soon as
 i know more.

Re: amplab jenkins is down

2014-09-04 Thread shane knapp

AND WE'RE UP!

sorry that this took so long...  i'll send out a more detailed explanation
of what happened soon.

now, off to back up jenkins.

shane


On Thu, Sep 4, 2014 at 1:27 PM, shane knapp skn...@berkeley.edu wrote:

 it's a faulty power switch on the firewall, which has been swapped out.
  we're about to reboot and be good to go.


 On Thu, Sep 4, 2014 at 1:19 PM, shane knapp skn...@berkeley.edu wrote:

 looks like some hardware failed, and we're swapping in a replacement.  i
 don't have more specific information yet -- including *what* failed, as our
 sysadmin is super busy ATM.  the root cause was an incorrect circuit being
 switched off during building maintenance.

 on a side note, this incident will be accelerating our plan to move the
 entire jenkins infrastructure in to a managed datacenter environment.  this
 will be our major push over the next couple of weeks.  more details about
 this, also, as soon as i get them.

 i'm very sorry about the downtime, we'll get everything up and running
 ASAP.


 On Thu, Sep 4, 2014 at 12:27 PM, shane knapp skn...@berkeley.edu wrote:

 looks like a power outage in soda hall.  more updates as they happen.


 On Thu, Sep 4, 2014 at 12:25 PM, shane knapp skn...@berkeley.edu
 wrote:

 i am trying to get things up and running, but it looks like either the
 firewall gateway or jenkins server itself is down.  i'll update as soon as
 i know more.

Re: amplab jenkins is down

2014-09-04 Thread shane knapp

looking

On Thu, Sep 4, 2014 at 4:21 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:

It appears that our main man is having trouble
https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/
hearing new requests
https://github.com/apache/spark/pull/2277#issuecomment-54549106.

Do we need some smelling salts?

On Thu, Sep 4, 2014 at 5:49 PM, shane knapp skn...@berkeley.edu wrote:

i'd ping the Jenkinsmench... the master was completely offline, so any
new
jobs wouldn't have reached it. any jobs that were queued when power was
lost probably started up, but jobs that were running would fail.

On Thu, Sep 4, 2014 at 2:45 PM, Nicholas Chammas
nicholas.cham...@gmail.com
wrote:

Woohoo! Thanks Shane.

Do you know if queued PR builds will automatically be picked up? Or do
we
have to ping the Jenkinmensch manually from each PR?

Nick

On Thu, Sep 4, 2014 at 5:37 PM, shane knapp skn...@berkeley.edu
wrote:

AND WE'RE UP!

sorry that this took so long... i'll send out a more detailed
explanation
of what happened soon.

now, off to back up jenkins.

shane

On Thu, Sep 4, 2014 at 1:27 PM, shane knapp skn...@berkeley.edu
wrote:

it's a faulty power switch on the firewall, which has been swapped
out.
we're about to reboot and be good to go.

On Thu, Sep 4, 2014 at 1:19 PM, shane knapp skn...@berkeley.edu
wrote:

looks like some hardware failed, and we're swapping in a
replacement.
i
don't have more specific information yet -- including *what* failed,
as our
sysadmin is super busy ATM. the root cause was an incorrect circuit
being
switched off during building maintenance.

on a side note, this incident will be accelerating our plan to move
the
entire jenkins infrastructure in to a managed datacenter
environment.
this
will be our major push over the next couple of weeks. more details
about
this, also, as soon as i get them.

i'm very sorry about the downtime, we'll get everything up and
running
ASAP.

On Thu, Sep 4, 2014 at 12:27 PM, shane knapp skn...@berkeley.edu
wrote:

looks like a power outage in soda hall. more updates as they
happen.

On Thu, Sep 4, 2014 at 12:25 PM, shane knapp skn...@berkeley.edu
wrote:

i am trying to get things up and running, but it looks like either
the
firewall gateway or jenkins server itself is down. i'll update as
soon as
i know more.

--
You received this message because you are subscribed to the Google
Groups
amp-infra group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to amp-infra+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: amplab jenkins is down

2014-09-04 Thread shane knapp

i'm going to restart jenkins and see if that fixes things.

On Thu, Sep 4, 2014 at 4:56 PM, shane knapp skn...@berkeley.edu wrote:

looking

On Thu, Sep 4, 2014 at 4:21 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

Do we need some smelling salts?

On Thu, Sep 4, 2014 at 5:49 PM, shane knapp skn...@berkeley.edu wrote:

On Thu, Sep 4, 2014 at 2:45 PM, Nicholas Chammas
nicholas.cham...@gmail.com
wrote:

Woohoo! Thanks Shane.

Do you know if queued PR builds will automatically be picked up? Or do
we
have to ping the Jenkinmensch manually from each PR?

Nick

On Thu, Sep 4, 2014 at 5:37 PM, shane knapp skn...@berkeley.edu
wrote:

AND WE'RE UP!

sorry that this took so long... i'll send out a more detailed
explanation
of what happened soon.

now, off to back up jenkins.

shane

On Thu, Sep 4, 2014 at 1:27 PM, shane knapp skn...@berkeley.edu
wrote:

it's a faulty power switch on the firewall, which has been swapped
out.
we're about to reboot and be good to go.

On Thu, Sep 4, 2014 at 1:19 PM, shane knapp skn...@berkeley.edu
wrote:

looks like some hardware failed, and we're swapping in a
replacement.
i
don't have more specific information yet -- including *what*
failed,
as our
sysadmin is super busy ATM. the root cause was an incorrect
circuit
being
switched off during building maintenance.

on a side note, this incident will be accelerating our plan to
move the
entire jenkins infrastructure in to a managed datacenter
environment.
this
will be our major push over the next couple of weeks. more details
about
this, also, as soon as i get them.

i'm very sorry about the downtime, we'll get everything up and
running
ASAP.

On Thu, Sep 4, 2014 at 12:27 PM, shane knapp skn...@berkeley.edu
wrote:

looks like a power outage in soda hall. more updates as they
happen.

On Thu, Sep 4, 2014 at 12:25 PM, shane knapp skn...@berkeley.edu

wrote:

i am trying to get things up and running, but it looks like
either
the
firewall gateway or jenkins server itself is down. i'll update
as
soon as
i know more.

Re: amplab jenkins is down

2014-09-04 Thread shane knapp

yep. that's exactly the behavior i saw earlier, and will be figuring out
first thing tomorrow morning. i bet it's an environment issues on the
slaves.

On Thu, Sep 4, 2014 at 7:10 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:

Looks like during the last build
https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19797/console
Jenkins was unable to execute a git fetch?

On Thu, Sep 4, 2014 at 7:58 PM, shane knapp skn...@berkeley.edu wrote:

i'm going to restart jenkins and see if that fixes things.

On Thu, Sep 4, 2014 at 4:56 PM, shane knapp skn...@berkeley.edu wrote:

looking

On Thu, Sep 4, 2014 at 4:21 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

Do we need some smelling salts?

On Thu, Sep 4, 2014 at 5:49 PM, shane knapp skn...@berkeley.edu
wrote:

i'd ping the Jenkinsmench... the master was completely offline, so
any new
jobs wouldn't have reached it. any jobs that were queued when power
was
lost probably started up, but jobs that were running would fail.

On Thu, Sep 4, 2014 at 2:45 PM, Nicholas Chammas
nicholas.cham...@gmail.com
wrote:

Woohoo! Thanks Shane.

Do you know if queued PR builds will automatically be picked up? Or
do we
have to ping the Jenkinmensch manually from each PR?

Nick

On Thu, Sep 4, 2014 at 5:37 PM, shane knapp skn...@berkeley.edu
wrote:

AND WE'RE UP!

sorry that this took so long... i'll send out a more detailed
explanation
of what happened soon.

now, off to back up jenkins.

shane

On Thu, Sep 4, 2014 at 1:27 PM, shane knapp skn...@berkeley.edu
wrote:

it's a faulty power switch on the firewall, which has been
swapped out.
we're about to reboot and be good to go.

On Thu, Sep 4, 2014 at 1:19 PM, shane knapp skn...@berkeley.edu
wrote:

looks like some hardware failed, and we're swapping in a
replacement.
i
don't have more specific information yet -- including *what*
failed,
as our
sysadmin is super busy ATM. the root cause was an incorrect
circuit
being
switched off during building maintenance.

on a side note, this incident will be accelerating our plan to
move the
entire jenkins infrastructure in to a managed datacenter
environment.
this
will be our major push over the next couple of weeks. more
details
about
this, also, as soon as i get them.

i'm very sorry about the downtime, we'll get everything up and
running
ASAP.

On Thu, Sep 4, 2014 at 12:27 PM, shane knapp
skn...@berkeley.edu
wrote:

looks like a power outage in soda hall. more updates as they
happen.

On Thu, Sep 4, 2014 at 12:25 PM, shane knapp
skn...@berkeley.edu
wrote:

i am trying to get things up and running, but it looks like
either
the
firewall gateway or jenkins server itself is down. i'll
update as
soon as
i know more.

--
You received this message because you are subscribed to the Google
Groups
amp-infra group.
To unsubscribe from this group and stop receiving emails from it,
send an
email to amp-infra+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: amplab jenkins is down

2014-09-05 Thread shane knapp

it's looking like everything except the pull request builders are working.
i'm going to be working on getting this resolved today.

On Fri, Sep 5, 2014 at 8:18 AM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:

Hmm, looks like at least some builds
https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19804/consoleFull
are working now, though this last one was from ~5 hours ago.

On Fri, Sep 5, 2014 at 1:02 AM, shane knapp skn...@berkeley.edu wrote:

yep. that's exactly the behavior i saw earlier, and will be figuring out
first thing tomorrow morning. i bet it's an environment issues on the
slaves.

On Thu, Sep 4, 2014 at 7:10 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

Looks like during the last build
https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19797/console
Jenkins was unable to execute a git fetch?

On Thu, Sep 4, 2014 at 7:58 PM, shane knapp skn...@berkeley.edu wrote:

i'm going to restart jenkins and see if that fixes things.

On Thu, Sep 4, 2014 at 4:56 PM, shane knapp skn...@berkeley.edu
wrote:

looking

On Thu, Sep 4, 2014 at 4:21 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

Do we need some smelling salts?

On Thu, Sep 4, 2014 at 5:49 PM, shane knapp skn...@berkeley.edu
wrote:

i'd ping the Jenkinsmench... the master was completely offline, so
any new
jobs wouldn't have reached it. any jobs that were queued when power
was
lost probably started up, but jobs that were running would fail.

On Thu, Sep 4, 2014 at 2:45 PM, Nicholas Chammas
nicholas.cham...@gmail.com
wrote:

Woohoo! Thanks Shane.

Do you know if queued PR builds will automatically be picked up?
Or do we
have to ping the Jenkinmensch manually from each PR?

Nick

On Thu, Sep 4, 2014 at 5:37 PM, shane knapp skn...@berkeley.edu
wrote:

AND WE'RE UP!

sorry that this took so long... i'll send out a more detailed
explanation
of what happened soon.

now, off to back up jenkins.

shane

On Thu, Sep 4, 2014 at 1:27 PM, shane knapp skn...@berkeley.edu
wrote:

it's a faulty power switch on the firewall, which has been
swapped out.
we're about to reboot and be good to go.

On Thu, Sep 4, 2014 at 1:19 PM, shane knapp
skn...@berkeley.edu
wrote: