Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-21 Thread shane knapp
i've seen a few more builds fail w/timeouts and it appears that we're definitely NOT hitting any rate limiting. https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22005/console [jenkins@amp-jenkins-slave-01 ~]$ curl -i -H Authorization: token REDACTED https://api.github.com |

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-18 Thread Davies Liu
How can we know the changes has been applied? I had checked several recent builds, they all use the original configs. Davies On Fri, Oct 17, 2014 at 6:17 PM, Josh Rosen rosenvi...@gmail.com wrote: FYI, I edited the Spark Pull Request Builder job to try this out. Let’s see if it works (I’ll be

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-18 Thread Josh Rosen
I think that the fix was applied.  Take a look at  https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21874/consoleFull Here, I see a fetch command that mentions this specific PR branch rather than the wildcard that we had before: git fetch --tags --progress

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-18 Thread Davies Liu
Cool, the recent 4 build had used the new configs, thanks! Let's run more builds. Davies On Fri, Oct 17, 2014 at 11:06 PM, Josh Rosen rosenvi...@gmail.com wrote: I think that the fix was applied. Take a look at

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-17 Thread Davies Liu
One finding is that all the timeout happened with this command: git fetch --tags --progress https://github.com/apache/spark.git +refs/pull/*:refs/remotes/origin/pr/* I'm thinking that maybe this may be a expensive call, we could try to use a more cheap one: git fetch --tags --progress

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-17 Thread Josh Rosen
FYI, I edited the Spark Pull Request Builder job to try this out.  Let’s see if it works (I’ll be around to revert if it doesn’t). On October 17, 2014 at 5:26:56 PM, Davies Liu (dav...@databricks.com) wrote: One finding is that all the timeout happened with this command: git fetch --tags

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-16 Thread shane knapp
the bad news is that we've had a couple more failures due to timeouts, but the good news is that the frequency that these happen has decreased significantly (3 in the past ~18hr). seems like the git plugin downgrade has helped relieve the problem, but hasn't fixed it. i'll be looking in to this

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-16 Thread Nicholas Chammas
Thanks for continuing to look into this, Shane. One suggestion that Patrick brought up, if we have trouble getting to the bottom of this, is doing the git checkout ourselves in the run-tests-jenkins script and cutting out the Jenkins git plugin entirely. That way we can script retries and post

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-16 Thread shane knapp
yeah, at this point it might be worth trying. :) the absolutely irritating thing is that i am not seeing this happen w/any other jobs other that the spark prb, nor does it seem to correlate w/time of day, network or system load, or what slave it runs on. nor are we hitting our limit of

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-16 Thread Nicholas Chammas
On Thu, Oct 16, 2014 at 3:55 PM, shane knapp skn...@berkeley.edu wrote: i really, truly hate non-deterministic failures. Amen bruddah.

short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-15 Thread shane knapp
i'm going to be downgrading our git plugin (from 2.2.7 to 2.2.2) to see if that helps w/the git fetch timeouts. this will require a short downtime (~20 mins for builds to finish, ~20 mins to downgrade), and will hopefully give us some insight in to wtf is going on. thanks for your patience...

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-15 Thread Nicholas Chammas
I support this effort. :thumbsup: On Wed, Oct 15, 2014 at 4:52 PM, shane knapp skn...@berkeley.edu wrote: i'm going to be downgrading our git plugin (from 2.2.7 to 2.2.2) to see if that helps w/the git fetch timeouts. this will require a short downtime (~20 mins for builds to finish, ~20

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-15 Thread shane knapp
ok, we're up and building... :crossesfingersfortheumpteenthtime: On Wed, Oct 15, 2014 at 1:59 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: I support this effort. :thumbsup: On Wed, Oct 15, 2014 at 4:52 PM, shane knapp skn...@berkeley.edu wrote: i'm going to be downgrading our

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-15 Thread shane knapp
four builds triggered and no timeouts. :crossestoes: :) On Wed, Oct 15, 2014 at 2:19 PM, shane knapp skn...@berkeley.edu wrote: ok, we're up and building... :crossesfingersfortheumpteenthtime: On Wed, Oct 15, 2014 at 1:59 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: I

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-15 Thread shane knapp
ok, we've had about 10 spark pull request builds go through w/o any git timeouts. it seems that the git timeout issue might be licked. i will be definitely be keeping an eye on this for the next few days. thanks for being patient! shane On Wed, Oct 15, 2014 at 2:27 PM, shane knapp

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-15 Thread Nicholas Chammas
A quick scan through the Spark PR board https://spark-prs.appspot.com/ shows no recent failures related to this git checkout problem. Looks promising! Nick On Wed, Oct 15, 2014 at 6:10 PM, shane knapp skn...@berkeley.edu wrote: ok, we've had about 10 spark pull request builds go through w/o