Re: jenkins maintenance/downtime, aug 28th, 730am-9am PDT

2014-08-28 Thread shane knapp
reminder: this is starting in 10 minutes On Wed, Aug 27, 2014 at 4:13 PM, shane knapp skn...@berkeley.edu wrote: tomorrow morning i will be upgrading jenkins to the latest/greatest (1.577). at 730am, i will put jenkins in to a quiet period, so no new builds will be accepted. once any

Re: jenkins maintenance/downtime, aug 28th, 730am-9am PDT

2014-08-28 Thread shane knapp
jenkins is now coming down. On Thu, Aug 28, 2014 at 7:19 AM, shane knapp skn...@berkeley.edu wrote: reminder: this is starting in 10 minutes On Wed, Aug 27, 2014 at 4:13 PM, shane knapp skn...@berkeley.edu wrote: tomorrow morning i will be upgrading jenkins to the latest/greatest (1.577

Re: jenkins maintenance/downtime, aug 28th, 730am-9am PDT

2014-08-28 Thread shane knapp
! :) On Thu, Aug 28, 2014 at 7:46 AM, shane knapp skn...@berkeley.edu wrote: jenkins is now coming down. On Thu, Aug 28, 2014 at 7:19 AM, shane knapp skn...@berkeley.edu wrote: reminder: this is starting in 10 minutes On Wed, Aug 27, 2014 at 4:13 PM, shane knapp skn...@berkeley.edu

Re: jenkins maintenance/downtime, aug 28th, 730am-9am PDT

2014-08-28 Thread shane knapp
this one job is blocking the jenkins restart: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19406/ i'm about to kill it so that i can get this done. i'll restart the job after jenkins is back up. On Thu, Aug 28, 2014 at 7:51 AM, shane knapp skn...@berkeley.edu wrote

Re: jenkins maintenance/downtime, aug 28th, 730am-9am PDT

2014-08-28 Thread shane knapp
all clear: jenkins and all plugins have been updated! On Thu, Aug 28, 2014 at 7:51 AM, shane knapp skn...@berkeley.edu wrote: jenkins is upgraded, but a few jobs sneaked in before i could do the plugin updates. i've put jenkins in quiet mode again, and once the spark builds finish, i'll

Re: jenkins maintenance/downtime, aug 28th, 730am-9am PDT

2014-08-28 Thread shane knapp
this, Shane. On Thursday, August 28, 2014, shane knapp skn...@berkeley.edu wrote: all clear: jenkins and all plugins have been updated! On Thu, Aug 28, 2014 at 7:51 AM, shane knapp skn...@berkeley.edu wrote: jenkins is upgraded, but a few jobs sneaked in before i could do the plugin updates

emergency jenkins restart, aug 29th, 730am-9am PDT -- plus a postmortem

2014-08-28 Thread shane knapp
as with all software upgrades, sometimes things don't always work as expected. a recent change to stapler[1], to verbosely report NotExportableExceptions[2] is spamming our jenkins log file with stack traces, which is growing rather quickly (1.2G since 9am). this has been reported to the jenkins

Re: emergency jenkins restart, aug 29th, 730am-9am PDT -- plus a postmortem

2014-08-29 Thread shane knapp
reminder: this is happening right now. jenkins is currently in quiet mode, and in ~30 minutes, will be briefly going down. On Thu, Aug 28, 2014 at 1:03 PM, shane knapp skn...@berkeley.edu wrote: as with all software upgrades, sometimes things don't always work as expected. a recent

Re: emergency jenkins restart, aug 29th, 730am-9am PDT -- plus a postmortem

2014-08-29 Thread shane knapp
this is done. On Fri, Aug 29, 2014 at 7:32 AM, shane knapp skn...@berkeley.edu wrote: reminder: this is happening right now. jenkins is currently in quiet mode, and in ~30 minutes, will be briefly going down. On Thu, Aug 28, 2014 at 1:03 PM, shane knapp skn...@berkeley.edu wrote

new jenkins plugin installed and ready for use

2014-08-29 Thread shane knapp
i have always found the 'Rebuild' plugin super useful: https://wiki.jenkins-ci.org/display/JENKINS/Rebuild+Plugin this is installed and enables. enjoy! shane

hey spark developers! intro from shane knapp, devops engineer @ AMPLab

2014-09-02 Thread shane knapp
so, i had a meeting w/the databricks guys on friday and they recommended i send an email out to the list to say 'hi' and give you guys a quick intro. :) hi! i'm shane knapp, the new AMPLab devops engineer, and will be spending time getting the jenkins build infrastructure up to production

Re: quick jenkins restart

2014-09-02 Thread shane knapp
and we're back and building! On Tue, Sep 2, 2014 at 5:07 PM, shane knapp skn...@berkeley.edu wrote: since our queue is really short, i'm waiting for a couple of builds to finish and will be restarting jenkins to install/update some plugins. the github pull request builder looks like it has

amplab jenkins is down

2014-09-04 Thread shane knapp
i am trying to get things up and running, but it looks like either the firewall gateway or jenkins server itself is down. i'll update as soon as i know more.

Re: amplab jenkins is down

2014-09-04 Thread shane knapp
looks like a power outage in soda hall. more updates as they happen. On Thu, Sep 4, 2014 at 12:25 PM, shane knapp skn...@berkeley.edu wrote: i am trying to get things up and running, but it looks like either the firewall gateway or jenkins server itself is down. i'll update as soon as i

Re: amplab jenkins is down

2014-09-04 Thread shane knapp
ASAP. On Thu, Sep 4, 2014 at 12:27 PM, shane knapp skn...@berkeley.edu wrote: looks like a power outage in soda hall. more updates as they happen. On Thu, Sep 4, 2014 at 12:25 PM, shane knapp skn...@berkeley.edu wrote: i am trying to get things up and running, but it looks like either

Re: amplab jenkins is down

2014-09-04 Thread shane knapp
it's a faulty power switch on the firewall, which has been swapped out. we're about to reboot and be good to go. On Thu, Sep 4, 2014 at 1:19 PM, shane knapp skn...@berkeley.edu wrote: looks like some hardware failed, and we're swapping in a replacement. i don't have more specific

Re: amplab jenkins is down

2014-09-04 Thread shane knapp
AND WE'RE UP! sorry that this took so long... i'll send out a more detailed explanation of what happened soon. now, off to back up jenkins. shane On Thu, Sep 4, 2014 at 1:27 PM, shane knapp skn...@berkeley.edu wrote: it's a faulty power switch on the firewall, which has been swapped out

Re: amplab jenkins is down

2014-09-04 Thread shane knapp
/pull/2277#issuecomment-54549106. Do we need some smelling salts? On Thu, Sep 4, 2014 at 5:49 PM, shane knapp skn...@berkeley.edu wrote: i'd ping the Jenkinsmench... the master was completely offline, so any new jobs wouldn't have reached it. any jobs that were queued when power was lost

Re: amplab jenkins is down

2014-09-04 Thread shane knapp
i'm going to restart jenkins and see if that fixes things. On Thu, Sep 4, 2014 at 4:56 PM, shane knapp skn...@berkeley.edu wrote: looking On Thu, Sep 4, 2014 at 4:21 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: It appears that our main man is having trouble https

Re: amplab jenkins is down

2014-09-04 Thread shane knapp
://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19797/console Jenkins was unable to execute a git fetch? On Thu, Sep 4, 2014 at 7:58 PM, shane knapp skn...@berkeley.edu wrote: i'm going to restart jenkins and see if that fixes things. On Thu, Sep 4, 2014 at 4:56

Re: amplab jenkins is down

2014-09-05 Thread shane knapp
/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19804/consoleFull are working now, though this last one was from ~5 hours ago. On Fri, Sep 5, 2014 at 1:02 AM, shane knapp skn...@berkeley.edu wrote: yep. that's exactly the behavior i saw earlier, and will be figuring out first thing

Re: amplab jenkins is down

2014-09-05 Thread shane knapp
://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19804/consoleFull are working now, though this last one was from ~5 hours ago. On Fri, Sep 5, 2014 at 1:02 AM, shane knapp skn...@berkeley.edu wrote: yep. that's exactly the behavior i saw earlier

yet another jenkins restart early thursday morning -- 730am PDT (and a brief update on our new jenkins infra)

2014-09-09 Thread shane knapp
since the power incident last thursday, the github pull request builder plugin is still not really working 100%. i found an open issue w/jenkins[1] that could definitely be affecting us, i will be pausing builds early thursday morning and then restarting jenkins. i'll send out a reminder

Re: yet another jenkins restart early thursday morning -- 730am PDT (and a brief update on our new jenkins infra)

2014-09-10 Thread shane knapp
/spark/pull/2339#issuecomment-55165937). Hopefully that will be resolved tomorrow. Nick On Tue, Sep 9, 2014 at 5:00 PM, shane knapp skn...@berkeley.edu wrote: since the power incident last thursday, the github pull request builder plugin is still not really working 100%. i found an open issue w

Re: yet another jenkins restart early thursday morning -- 730am PDT (and a brief update on our new jenkins infra)

2014-09-11 Thread shane knapp
jenkins is now in quiet mode, and a restart is happening soon. On Wed, Sep 10, 2014 at 3:44 PM, shane knapp skn...@berkeley.edu wrote: that's kinda what we're hoping as well. :) On Wed, Sep 10, 2014 at 2:46 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: I'm looking forward

Re: yet another jenkins restart early thursday morning -- 730am PDT (and a brief update on our new jenkins infra)

2014-09-11 Thread shane knapp
...and the restart is done. On Thu, Sep 11, 2014 at 7:38 AM, shane knapp skn...@berkeley.edu wrote: jenkins is now in quiet mode, and a restart is happening soon. On Wed, Sep 10, 2014 at 3:44 PM, shane knapp skn...@berkeley.edu wrote: that's kinda what we're hoping as well. :) On Wed

Re: yet another jenkins restart early thursday morning -- 730am PDT (and a brief update on our new jenkins infra)

2014-09-11 Thread shane knapp
: shane, is there anything we should do for pull requests that failed, but for unrelated issues? best, matt On 09/11/2014 11:29 AM, shane knapp wrote: ...and the restart is done. On Thu, Sep 11, 2014 at 7:38 AM, shane knapp skn...@berkeley.edu wrote: jenkins is now in quiet mode

FYI: jenkins systems patched to fix bash exploit

2014-09-26 Thread shane knapp
all of our systems were affected by the shellshock bug, and i've just patched everything w/the latest fix from redhat: https://access.redhat.com/articles/1200223 we're not running bash.x86_64 0:4.1.2-15.el6_5.2 on all of our systems. shane

Re: FYI: jenkins systems patched to fix bash exploit

2014-09-26 Thread shane knapp
we're not running bash.x86_64 0:4.1.2-15.el6_5.2 on all of our systems. s/not/now :)

jenkins downtime/system upgrade wednesday morning, 730am PDT

2014-09-29 Thread shane knapp
happy monday, everyone! remember a few weeks back when i upgraded jenkins, and unwittingly began DOSing our system due to massive log spam? well, that bug has been fixed w/the current release and i'd like to get our logging levels back to something more verbose that we have now. downtime will

Re: FYI: i've doubled the jenkins executors for every build node

2014-09-29 Thread shane knapp
:25 PM, Reynold Xin r...@databricks.com wrote: Thanks. We might see more failures due to contention on resources. Fingers acrossed ... At some point it might make sense to run the tests in a VM or container. On Mon, Sep 29, 2014 at 2:20 PM, shane knapp skn...@berkeley.edu wrote: we were

Re: jenkins downtime/system upgrade wednesday morning, 730am PDT

2014-09-30 Thread shane knapp
https://issues.apache.org/jira/browse/SPARK-3745 On Tue, Sep 30, 2014 at 10:22 AM, shane knapp skn...@berkeley.edu wrote: (this time, reply to all) nice catch. there's a bug in spark/dev/check-license, which i've confirmed from the CLI. i'll open a bug and PR to fix it. On Mon, Sep 29

Re: jenkins downtime/system upgrade wednesday morning, 730am PDT

2014-09-30 Thread shane knapp
reminder: this is happening tomorrow morning. i will be putting jenkins in to quiet mode at ~7am, and then doing the upgrade once any stray builds finish. On Mon, Sep 29, 2014 at 1:43 PM, shane knapp skn...@berkeley.edu wrote: happy monday, everyone! remember a few weeks back when i

Re: amplab jenkins is down

2014-10-01 Thread shane knapp
nicholas.cham...@gmail.com wrote: On Thu, Sep 4, 2014 at 4:19 PM, shane knapp skn...@berkeley.edu wrote: on a side note, this incident will be accelerating our plan to move the entire jenkins infrastructure in to a managed datacenter environment. this will be our major push over the next couple

emergency jenkins restart -- massive security patch released

2014-10-03 Thread shane knapp
https://wiki.jenkins-ci.org/display/SECURITY/Jenkins+Security+Advisory+2014-10-01 there's some pretty big stuff that's been identified and we need to get this upgraded asap. i'll be killing off what's currently running, and will retrigger them all once we're done. sorry for the inconvenience.

Re: emergency jenkins restart -- massive security patch released

2014-10-03 Thread shane knapp
update complete. i'm retriggering builds now. On Fri, Oct 3, 2014 at 10:51 AM, shane knapp skn...@berkeley.edu wrote: https://wiki.jenkins-ci.org/display/SECURITY/Jenkins+Security+Advisory+2014-10-01 there's some pretty big stuff that's been identified and we need to get this upgraded asap

Re: new jenkins update + tentative release date

2014-10-13 Thread shane knapp
Jenkins is in quiet mode and the move will be starting after i have my coffee. :) On Sun, Oct 12, 2014 at 11:26 PM, Josh Rosen rosenvi...@gmail.com wrote: Reminder: this Jenkins migration is happening tomorrow morning (Monday). On Fri, Oct 10, 2014 at 1:01 PM, shane knapp skn...@berkeley.edu

Re: new jenkins update + tentative release date

2014-10-13 Thread shane knapp
quick update: we should be back up and running in the next ~60mins. On Mon, Oct 13, 2014 at 7:54 AM, shane knapp skn...@berkeley.edu wrote: Jenkins is in quiet mode and the move will be starting after i have my coffee. :) On Sun, Oct 12, 2014 at 11:26 PM, Josh Rosen rosenvi...@gmail.com

Re: new jenkins update + tentative release date

2014-10-13 Thread shane knapp
AND WE ARE LIIIVE! https://amplab.cs.berkeley.edu/jenkins/ have at it, folks! On Mon, Oct 13, 2014 at 10:15 AM, shane knapp skn...@berkeley.edu wrote: quick update: we should be back up and running in the next ~60mins. On Mon, Oct 13, 2014 at 7:54 AM, shane knapp skn...@berkeley.edu

Re: new jenkins update + tentative release date

2014-10-13 Thread shane knapp
On Mon, Oct 13, 2014 at 2:28 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Thanks for doing this work Shane. So is Jenkins in the new datacenter now? Do you know if the problems with checking out patches from GitHub should be resolved now? Here's an example from the past hour

Re: new jenkins update + tentative release date

2014-10-13 Thread shane knapp
Chammas nicholas.cham...@gmail.com wrote: Ah, that sucks. Thank you for looking into this. On Mon, Oct 13, 2014 at 5:43 PM, shane knapp skn...@berkeley.edu wrote: On Mon, Oct 13, 2014 at 2:28 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Thanks for doing this work Shane. So

short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-15 Thread shane knapp
i'm going to be downgrading our git plugin (from 2.2.7 to 2.2.2) to see if that helps w/the git fetch timeouts. this will require a short downtime (~20 mins for builds to finish, ~20 mins to downgrade), and will hopefully give us some insight in to wtf is going on. thanks for your patience...

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-15 Thread shane knapp
ok, we're up and building... :crossesfingersfortheumpteenthtime: On Wed, Oct 15, 2014 at 1:59 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: I support this effort. :thumbsup: On Wed, Oct 15, 2014 at 4:52 PM, shane knapp skn...@berkeley.edu wrote: i'm going to be downgrading our

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-15 Thread shane knapp
four builds triggered and no timeouts. :crossestoes: :) On Wed, Oct 15, 2014 at 2:19 PM, shane knapp skn...@berkeley.edu wrote: ok, we're up and building... :crossesfingersfortheumpteenthtime: On Wed, Oct 15, 2014 at 1:59 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: I

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-15 Thread shane knapp
ok, we've had about 10 spark pull request builds go through w/o any git timeouts. it seems that the git timeout issue might be licked. i will be definitely be keeping an eye on this for the next few days. thanks for being patient! shane On Wed, Oct 15, 2014 at 2:27 PM, shane knapp skn

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-16 Thread shane knapp
, shane knapp skn...@berkeley.edu wrote: ok, we've had about 10 spark pull request builds go through w/o any git timeouts. it seems that the git timeout issue might be licked. i will be definitely be keeping an eye on this for the next few days. thanks for being patient! shane On Wed, Oct 15

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-16 Thread shane knapp
if they still occur by ourselves. Do you think that’s worth trying at some point? Nick ​ On Thu, Oct 16, 2014 at 2:04 PM, shane knapp skn...@berkeley.edu wrote: the bad news is that we've had a couple more failures due to timeouts, but the good news is that the frequency that these happen has

Re: something wrong with Jenkins or something untested merged?

2014-10-20 Thread shane knapp
, Oct 20, 2014 at 5:08 PM, shane knapp skn...@berkeley.edu wrote: hmm, strange. i'll take a look. On Mon, Oct 20, 2014 at 5:11 PM, Nan Zhu zhunanmcg...@gmail.com wrote: yes, I can compile locally, too but it seems that Jenkins is not happy now... https://amplab.cs.berkeley.edu/jenkins

Re: something wrong with Jenkins or something untested merged?

2014-10-20 Thread shane knapp
the source code issues in the Kinesis code that made stricter Java compilers reject it. - Patrick On Mon, Oct 20, 2014 at 5:28 PM, shane knapp skn...@berkeley.edu wrote: ok, so earlier today i installed a 2nd JDK within jenkins (7u71), which fixed the SparkR build but apparently made Spark

Re: something wrong with Jenkins or something untested merged?

2014-10-21 Thread shane knapp
reserved. Which JDK is actually used by Jenkins? Cheng On 10/21/14 8:28 AM, shane knapp wrote: ok, so earlier today i installed a 2nd JDK within jenkins (7u71), which fixed the SparkR build but apparently made Spark itself quite

Re: something wrong with Jenkins or something untested merged?

2014-10-21 Thread shane knapp
on. in this case, it's openjdk 7u65... and spark compilation fails. i've removed the 2nd JDK (7u71) from jenkins, and everything is back to normal. On Tue, Oct 21, 2014 at 11:51 AM, shane knapp skn...@berkeley.edu wrote: i'm currently in a meeting and will be starting to do some tests in ~1 hour or so

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-21 Thread shane knapp
have not tested this yet, could you give this a try? Davies [1] https://wiki.jenkins-ci.org/display/JENKINS/GitHub+pull+request+builder+plugin On Fri, Oct 17, 2014 at 5:00 PM, shane knapp skn...@berkeley.edu wrote: actually, nvm, you have to be run that command from our servers

your weekly git timeout update! TL;DR: i'm now almost certain we're not hitting rate limits.

2014-10-24 Thread shane knapp
so, things look like they've stabilized significantly over the past 10 days, and without any changes on our end: snip $ /root/tools/get_timeouts.sh 10 timeouts by date: 2014-10-14 -- 2 2014-10-16 -- 1 2014-10-19 -- 1 2014-10-20 -- 2 2014-10-23 -- 5 timeouts by project: 5

jenkins downtime tomorrow morning ~6am-8am PDT

2014-10-27 Thread shane knapp
i'll be bringing jenkins down tomorrow morning for some system maintenance and to get our backups kicked off. i do expect to have the system back up and running before 8am. please let me know ASAP if i need to reschedule this. thanks, shane

jenkins emergency restart now, was Re: jenkins downtime tomorrow morning ~6am-8am PDT

2014-10-27 Thread shane knapp
the jobs i've killed. this DOES NOT effect the restart/maintenance tomorrow morning. sorry about the inconvenience, shane On Mon, Oct 27, 2014 at 10:46 AM, shane knapp skn...@berkeley.edu wrote: i'll be bringing jenkins down tomorrow morning for some system maintenance and to get our backups

Re: jenkins emergency restart now, was Re: jenkins downtime tomorrow morning ~6am-8am PDT

2014-10-27 Thread shane knapp
ok we're back up and building. i've retriggered the jobs i killed. On Mon, Oct 27, 2014 at 1:24 PM, shane knapp skn...@berkeley.edu wrote: so, i'm having a race condition between a plugin i installed putting jenkins in to quiet mode and it failing to perform a backup from this past weekend

Re: jenkins downtime tomorrow morning ~6am-8am PDT

2014-10-28 Thread shane knapp
this is done, and jenkins is up and building again. On Mon, Oct 27, 2014 at 10:46 AM, shane knapp skn...@berkeley.edu wrote: i'll be bringing jenkins down tomorrow morning for some system maintenance and to get our backups kicked off. i do expect to have the system back up and running before

[important] jenkins down

2014-11-20 Thread shane knapp
i noticed that there were no builds, and noticed that it's throwing a bunch of exceptions in the log file. i'm looking in to this right now and will update when i get things rolling again. sorry for the inconvenience, shane

Re: [important] jenkins down

2014-11-20 Thread shane knapp
cautious when updating even recommended plugins. sorry for the disruption! shane On Thu, Nov 20, 2014 at 10:21 AM, shane knapp skn...@berkeley.edu wrote: i noticed that there were no builds, and noticed that it's throwing a bunch of exceptions in the log file. i'm looking in to this right now

jenkins downtime: 730-930am, 12/12/14

2014-12-01 Thread shane knapp
i'll send out a reminder next week, but i wanted to give a heads up: i'll be bringing down the entire jenkins infrastructure for reboots and system updates. please let me know if there are any conflicts with this, thanks! shane

adding new jenkins worker nodes to eventually replace existing ones

2014-12-09 Thread shane knapp
i just turned up a new jenkins slave (amp-jenkins-worker-01) to ensure it builds properly. these machines have half the ram, same number of processors and more disk, which will hopefully help us achieve more than the ~15-20% system utilization we're getting on the current

Re: adding new jenkins worker nodes to eventually replace existing ones

2014-12-09 Thread shane knapp
forgot to install git on this node. /headdesk i retirggered the failed spark prb jobs. On Tue, Dec 9, 2014 at 10:49 AM, shane knapp skn...@berkeley.edu wrote: i just turned up a new jenkins slave (amp-jenkins-worker-01) to ensure it builds properly. these machines have half the ram, same

Re: jenkins downtime: 730-930am, 12/12/14

2014-12-10 Thread shane knapp
reminder -- this is happening friday morning @ 730am! On Mon, Dec 1, 2014 at 5:10 PM, shane knapp skn...@berkeley.edu wrote: i'll send out a reminder next week, but i wanted to give a heads up: i'll be bringing down the entire jenkins infrastructure for reboots and system updates. please

Re: jenkins downtime: 730-930am, 12/12/14

2014-12-11 Thread shane knapp
. things there seems to be working just fine! i'm expecting to be up and building by 9am at the latest. i'll update this thread w/any new time estimates. word. shane, your rained-in devops guy :) On Wed, Dec 10, 2014 at 11:28 AM, shane knapp skn...@berkeley.edu wrote: reminder

Re: jenkins downtime: 730-930am, 12/12/14

2014-12-12 Thread shane knapp
reminder: jenkins is going down NOW. On Thu, Dec 11, 2014 at 3:08 PM, shane knapp skn...@berkeley.edu wrote: here's the plan... reboots, of course, come last. :) pause build queue at 7am, kill off (and eventually retrigger) any stragglers at 8am. then begin maintenance: all systems

Re: jenkins downtime: 730-930am, 12/12/14

2014-12-12 Thread shane knapp
downtime is extended to 10am PST so that i can finish testing the numpy upgrade... besides that, everything looks good and the system updates and reboots went off w/o a hitch. shane On Fri, Dec 12, 2014 at 7:26 AM, shane knapp skn...@berkeley.edu wrote: reminder: jenkins is going down NOW

Re: jenkins downtime: 730-930am, 12/12/14

2014-12-12 Thread shane knapp
ok, we're back up w/all new jenkins workers. i'll be keeping an eye on these pretty closely today for any build failures caused by the new systems, and if things look bleak, i'll switch back to the original five. thanks for your patience! On Fri, Dec 12, 2014 at 8:47 AM, shane knapp skn

Re: jenkins downtime: 730-930am, 12/12/14

2014-12-14 Thread shane knapp
josh rosen has this PR open to address the streaming test failures: https://github.com/apache/spark/pull/3687 On Sun, Dec 14, 2014 at 8:21 AM, WangTaoTheTonic barneystin...@aliyun.com wrote: Jenkins is still not available now as some unit tests(about streaming) failed all the time. Does it

Re: Archiving XML test reports for analysis

2014-12-15 Thread shane knapp
right now, the following logs are archived on to the master: local log_files=$( find .\ -name unit-tests.log -o\ -path ./sql/hive/target/HiveCompatibilitySuite.failed -o\ -path ./sql/hive/target/HiveCompatibilitySuite.hiveFailed -o\ -path

Re: Archiving XML test reports for analysis

2014-12-15 Thread shane knapp
it roughly be if we uploaded all the logs for all these builds? Also, would Databricks be willing to offer up an S3 bucket for this purpose? Nick On Mon Dec 15 2014 at 11:48:44 AM shane knapp skn...@berkeley.edu wrote: right now, the following logs are archived on to the master: local

Re: Jenkins install reference

2015-02-03 Thread shane knapp
here's the wiki describing the system setup: https://cwiki.apache.org/confluence/display/SPARK/Spark+QA+Infrastructure we have 1 master and 8 worker nodes, 12 executors per worker (we'd be better off w/more and smaller worker nodes however). you don't need to install sbt -- it's in the build/

Re: spark 1.3 sbt build seems to be broken

2015-02-05 Thread shane knapp
] Latest remote head revision is: fba2dc663a644cfe76a744b5cace93e9d6646a25 Done. Took 2.5 sec Changes found from: https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.3-SBT/18/pollingLog/ On Thu, Feb 5, 2015 at 5:01 PM, shane knapp skn...@berkeley.edu wrote: https://amplab.cs.berkeley.edu/jenkins

spark 1.3 sbt build seems to be broken

2015-02-05 Thread shane knapp
https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.3-SBT/ we're seeing java OOMs and heap space errors: https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.3-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/19/console

Re: quick jenkins restart tomorrow morning, ~7am PST

2015-02-18 Thread shane knapp
/SparkPullRequestBuilder/27690/ On Wed, Feb 18, 2015 at 12:55 PM, shane knapp skn...@berkeley.edu wrote: i'll be kicking jenkins to up the open file limits on the workers. it should be a very short downtime, and i'll post updates on my progress tomorrow. shane

quick jenkins restart tomorrow morning, ~7am PST

2015-02-18 Thread shane knapp
i'll be kicking jenkins to up the open file limits on the workers. it should be a very short downtime, and i'll post updates on my progress tomorrow. shane

Re: emergency jenkins restart soon

2015-01-29 Thread shane knapp
the master builds triggered around ~1am last night (according to the logs), so it looks like we're back in business. On Wed, Jan 28, 2015 at 10:32 PM, shane knapp skn...@berkeley.edu wrote: np! the master builds haven't triggered yet, but let's give the rube goldberg machine a minute to get

Re: emergency jenkins restart soon

2015-01-28 Thread shane knapp
jenkins is back up and all builds have been retriggered... things are building and looking good, and i'll keep an eye on the spark master builds tonite and tomorrow. On Wed, Jan 28, 2015 at 9:56 PM, shane knapp skn...@berkeley.edu wrote: the spark master builds stopped triggering ~yesterday

Re: emergency jenkins restart soon

2015-01-28 Thread shane knapp
np! the master builds haven't triggered yet, but let's give the rube goldberg machine a minute to get it's bearings. On Wed, Jan 28, 2015 at 10:31 PM, Reynold Xin r...@databricks.com wrote: Thanks for doing that, Shane! On Wed, Jan 28, 2015 at 10:29 PM, shane knapp skn...@berkeley.edu wrote

adding some temporary jenkins worker nodes...

2015-02-09 Thread shane knapp
...to help w/the build backlog. let's all welcome amp-jenkins-slave-{01..03} back to the fray!

jenkins redirect down (but jenkins is up!), lots of potential

2015-01-05 Thread shane knapp
UC Berkeley had some major maintenance done this past weekend, and long story short, not everything came back. our primary webserver's NFS is down and that means we're not serving websites, meaning that the redirect to jenkins is failing. jenkins is still up, and building some jobs, but we will

Re: jenkins redirect down (but jenkins is up!), lots of potential

2015-01-06 Thread shane knapp
down. In the meantime, though, you can continue to access Jenkins through https://hadrian.ist.berkeley.edu/jenkins/ On Mon, Jan 5, 2015 at 10:37 AM, shane knapp skn...@berkeley.edu wrote: UC Berkeley had some major maintenance done this past weekend, and long story short, not everything came

Re: extended jenkins downtime monday, march 16th, plus some hints at the future

2015-03-16 Thread shane knapp
and will squash anything else that pops up. On Mon, Mar 16, 2015 at 9:06 AM, shane knapp skn...@berkeley.edu wrote: looks like we're having some issues w/the pull request builder and cron stacktraces in the logs. i'll be investigating further and will update when i figure out what's going

Re: extended jenkins downtime monday, march 16th, plus some hints at the future

2015-03-16 Thread shane knapp
this is starting now. On Fri, Mar 13, 2015 at 10:12 AM, shane knapp skn...@berkeley.edu wrote: i'll be taking jenkins down for some much-needed plugin updates, as well as potentially upgrading jenkins itself. this will start at 730am PDT, and i'm hoping to have everything up by noon

extended jenkins downtime monday, march 16th, plus some hints at the future

2015-03-13 Thread shane knapp
i'll be taking jenkins down for some much-needed plugin updates, as well as potentially upgrading jenkins itself. this will start at 730am PDT, and i'm hoping to have everything up by noon. the move to the anaconda python will take place in the next couple of weeks as i'm in the process of

jenkins httpd being flaky

2015-03-13 Thread shane knapp
we just started having issues when visiting jenkins and getting 503 service unavailable errors. i'm on it and will report back with an all-clear.

Re: jenkins httpd being flaky

2015-03-13 Thread shane knapp
, but i think it was a confluence of events (httpd flaking, problems at github, mercury in retrograde, friday thinking it's monday). shane On Fri, Mar 13, 2015 at 1:08 PM, shane knapp skn...@berkeley.edu wrote: i tried a couple of things, but will also be doing a jenkins reboot as soon

Re: jenkins httpd being flaky

2015-03-13 Thread shane knapp
i tried a couple of things, but will also be doing a jenkins reboot as soon as the current batch of builds finish. On Fri, Mar 13, 2015 at 12:40 PM, shane knapp skn...@berkeley.edu wrote: ok we have a few different things happening: 1) httpd on the jenkins master is randomly (though

Re: PR Builder timing out due to ivy cache lock

2015-03-13 Thread shane knapp
, Mar 13, 2015 at 12:03 PM, Hari Shreedharan hshreedha...@cloudera.com wrote: Here you are: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28571/consoleFull On Fri, Mar 13, 2015 at 11:58 AM, shane knapp skn...@berkeley.edu wrote: link to a build, please? On Fri, Mar 13

jenkins upgraded to 1.606....

2015-03-25 Thread shane knapp
...due to some big security fixes: https://wiki.jenkins-ci.org/display/SECURITY/Jenkins+Security+Advisory+2015-03-23 :) shane

short jenkins 7am downtime tomorrow morning (3-5-15)

2015-03-04 Thread shane knapp
the master and workers need some system and package updates, and i'll also be rebooting the machines as well. this shouldn't take very long to perform, and i expect jenkins to be back up and building by 9am at the *latest*. important note: i will NOT be updating jenkins or any of the plugins

[jenkins infra -- pls read ] installing anaconda, moving default python from 2.6 - 2.7

2015-02-23 Thread shane knapp
good morning, developers! TL;DR: i will be installing anaconda and setting it in the system PATH so that your python will default to 2.7, as well as it taking over management of all of the sci-py packages. this is potentially a big change, so i'll be testing locally on my staging instance

Re: [jenkins infra -- pls read ] installing anaconda, moving default python from 2.6 - 2.7

2015-02-23 Thread shane knapp
On Mon, Feb 23, 2015 at 11:36 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: The first concern for Spark will probably be to ensure that we still build and test against Python 2.6, since that's the minimum version of Python we support. sounds good... we can set up separate 2.6

Re: [jenkins infra -- pls read ] installing anaconda, moving default python from 2.6 - 2.7

2015-02-25 Thread shane knapp
. shane On Mon, Feb 23, 2015 at 11:13 AM, shane knapp skn...@berkeley.edu wrote: good morning, developers! TL;DR: i will be installing anaconda and setting it in the system PATH so that your python will default to 2.7, as well as it taking over management of all of the sci-py packages

Re: [ERROR] bin/compute-classpath.sh: fails with false positive test for java 1.7 vs 1.6

2015-02-24 Thread shane knapp
it's not downgraded, it's your /etc/alternatives setup that's causing this. you can update all of those entries by executing the following commands (as root): update-alternatives --install /usr/bin/java java /usr/java/latest/bin/java 1 update-alternatives --install /usr/bin/javah javah

Re: Test all the things (Was: Unit test logs in Jenkins?)

2015-04-02 Thread shane knapp
https://issues.apache.org/jira/browse/SPARK-3431. That still needs work before it becomes possible. Nick On Thu, Apr 2, 2015 at 11:59 AM shane knapp skn...@berkeley.edu wrote: i agree with all of this. but can we please break up the tests and make them shorter? :) On Thu, Apr 2, 2015 at 8

extended jenkins downtime, thursday april 9th 7am-noon PDT (moving to anaconda python more)

2015-04-03 Thread shane knapp
welcome to python2.7+, java 8 and more! :) i'll be doing a major upgrade to our build system next thursday morning. here's a quick list of what's going on: * installation of anaconda python on all worker nodes * installation of pypy 2.5.1 (python 2.7) on all nodes * matching installation of

Jenkins down

2015-04-24 Thread shane knapp
jenkins is currently unreachable. i'm not entirely sure why, as i can't ssh in to the box and see what's going on. i've filed a ticket and will let everyone know when i have more information. shane

Re: Jenkins down

2015-04-24 Thread shane knapp
looks like we had a power failure on campus, and our datacenter is working to bring things back up: http://systemstatus.berkeley.edu/ On Fri, Apr 24, 2015 at 11:24 AM, shane knapp skn...@berkeley.edu wrote: jenkins is currently unreachable. i'm not entirely sure why, as i can't ssh

Re: Jenkins down

2015-04-24 Thread shane knapp
. On Fri, Apr 24, 2015 at 3:18 PM, shane knapp skn...@berkeley.edu wrote: ok, jenkins is back up and building. we have a few things to mop up here (ganglia is sad), but i think we'll be good for the afternoon. shane On Fri, Apr 24, 2015 at 2:17 PM, shane knapp skn...@berkeley.edu wrote

Re: Jenkins down

2015-04-24 Thread shane knapp
AM, shane knapp skn...@berkeley.edu wrote: looks like we had a power failure on campus, and our datacenter is working to bring things back up: http://systemstatus.berkeley.edu/ On Fri, Apr 24, 2015 at 11:24 AM, shane knapp skn...@berkeley.edu wrote: jenkins is currently unreachable. i'm

  1   2   3   4   5   6   >