Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-13 Thread Xiao Li
Thank you very much, Shane!

Xiao

On Mon, Jul 13, 2020 at 10:15 AM shane knapp ☠  wrote:

> alright, the system load graphs show that we've had a generally decreasing
> load since friday, and have burned through ~3k builds/day since the reboot
> last week!  i don't see many timeouts, and the PRB builds have been
> generally green for a couple of days.
>
> again, i will keep an eye on things but i feel we're out of the woods
> right now.  :)
>
> shane
>
> On Fri, Jul 10, 2020 at 3:43 PM Frank Yin  wrote:
>
>> Great. Thanks.
>>
>> On Fri, Jul 10, 2020 at 3:39 PM shane knapp ☠ 
>> wrote:
>>
>>> no, 8 hours is plenty.  things will speed up soon once the backlog of
>>> builds works through  i limited the number of PRB builds to 4 per
>>> worker, and things are looking better.  let's see how we look next week.
>>>
>>> On Fri, Jul 10, 2020 at 3:31 PM Frank Yin  wrote:
>>>
 Can we also increase the build timeout?

 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125617
 This one fails because it times out, not because of test failures.

 On Fri, Jul 10, 2020 at 2:16 PM Frank Yin  wrote:

> Yeah, that's what I figured -- those workers are under load. Thanks.
>
> On Fri, Jul 10, 2020 at 12:43 PM shane knapp ☠ 
> wrote:
>
>> only 125561, 125562 and 125564 were impacted by -9.
>>
>> 125565 exited w/a code of 15 (143 - 128), which means the process was
>> terminated for unknown reasons.
>>
>> 125563 looks like mima failed due to a bunch of errors.
>>
>> i just spot checked a bunch of recent failed PRB builds from today
>> and they all seemed to be legit.
>>
>> another thing that might be happening is an overload of PRB builds on
>> the workers due to the backlog...  the workers are under a LOT of load
>> right now, and i can put some rate limiting in to see if that helps out.
>>
>> shane
>>
>> On Fri, Jul 10, 2020 at 11:31 AM Frank Yin 
>> wrote:
>>
>>> Like from build number 125565 to 125561, all impacted by kill -9.
>>>
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console
>>>
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125564/console
>>>
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125563/console
>>>
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125562/console
>>>
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125561/console
>>>
>>> On Fri, Jul 10, 2020 at 9:35 AM shane knapp ☠ 
>>> wrote:
>>>
 define "a lot" and provide some links to those builds, please.
 there are roughly 2000 builds per day, and i can't do more than keep a
 cursory eye on things.

 the infrastructure that the tests run on hasn't changed one bit on
 any of the workers, and 'kill -9' could be a timeout, flakiness caused 
 by
 old build processes remaining on the workers after the master went 
 down, or
 me trying to clean things up w/o a reboot.  or, perhaps, something 
 wrong
 w/the infra.  :)

 On Fri, Jul 10, 2020 at 9:28 AM Frank Yin 
 wrote:

> Agree, but I’ve seen a lot of kill by signal 9, assuming that
> infrastructure?
>
> On Fri, Jul 10, 2020 at 8:19 AM shane knapp ☠ 
> wrote:
>
>> yeah, i can't do much for flaky tests...  just flaky
>> infrastructure.
>>
>>
>> On Fri, Jul 10, 2020 at 12:41 AM Hyukjin Kwon <
>> gurwls...@gmail.com> wrote:
>>
>>> Couple of flaky tests can happen. It's usual. Seems it got
>>> better now at least. I will keep monitoring the builds.
>>>
>>> 2020년 7월 10일 (금) 오후 4:33, ukby1234 님이 작성:
>>>
 Looks like Jenkins isn't stable still. My PR fails two times in
 a row:

 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console

 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125536/testReport



 --
 Sent from:
 http://apache-spark-developers-list.1001551.n3.nabble.com/


 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


>>
>> --
>> Shane Knapp
>> Computer Guy / Voice of Reason
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>

 --
 Shane Knapp
 Computer Guy / Voice of Reason
 UC Berkeley EECS Research / RISELab Staff Technical Lead

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-13 Thread shane knapp ☠
alright, the system load graphs show that we've had a generally decreasing
load since friday, and have burned through ~3k builds/day since the reboot
last week!  i don't see many timeouts, and the PRB builds have been
generally green for a couple of days.

again, i will keep an eye on things but i feel we're out of the woods right
now.  :)

shane

On Fri, Jul 10, 2020 at 3:43 PM Frank Yin  wrote:

> Great. Thanks.
>
> On Fri, Jul 10, 2020 at 3:39 PM shane knapp ☠  wrote:
>
>> no, 8 hours is plenty.  things will speed up soon once the backlog of
>> builds works through  i limited the number of PRB builds to 4 per
>> worker, and things are looking better.  let's see how we look next week.
>>
>> On Fri, Jul 10, 2020 at 3:31 PM Frank Yin  wrote:
>>
>>> Can we also increase the build timeout?
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125617
>>> This one fails because it times out, not because of test failures.
>>>
>>> On Fri, Jul 10, 2020 at 2:16 PM Frank Yin  wrote:
>>>
 Yeah, that's what I figured -- those workers are under load. Thanks.

 On Fri, Jul 10, 2020 at 12:43 PM shane knapp ☠ 
 wrote:

> only 125561, 125562 and 125564 were impacted by -9.
>
> 125565 exited w/a code of 15 (143 - 128), which means the process was
> terminated for unknown reasons.
>
> 125563 looks like mima failed due to a bunch of errors.
>
> i just spot checked a bunch of recent failed PRB builds from today and
> they all seemed to be legit.
>
> another thing that might be happening is an overload of PRB builds on
> the workers due to the backlog...  the workers are under a LOT of load
> right now, and i can put some rate limiting in to see if that helps out.
>
> shane
>
> On Fri, Jul 10, 2020 at 11:31 AM Frank Yin 
> wrote:
>
>> Like from build number 125565 to 125561, all impacted by kill -9.
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125564/console
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125563/console
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125562/console
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125561/console
>>
>> On Fri, Jul 10, 2020 at 9:35 AM shane knapp ☠ 
>> wrote:
>>
>>> define "a lot" and provide some links to those builds, please.
>>> there are roughly 2000 builds per day, and i can't do more than keep a
>>> cursory eye on things.
>>>
>>> the infrastructure that the tests run on hasn't changed one bit on
>>> any of the workers, and 'kill -9' could be a timeout, flakiness caused 
>>> by
>>> old build processes remaining on the workers after the master went 
>>> down, or
>>> me trying to clean things up w/o a reboot.  or, perhaps, something wrong
>>> w/the infra.  :)
>>>
>>> On Fri, Jul 10, 2020 at 9:28 AM Frank Yin 
>>> wrote:
>>>
 Agree, but I’ve seen a lot of kill by signal 9, assuming that
 infrastructure?

 On Fri, Jul 10, 2020 at 8:19 AM shane knapp ☠ 
 wrote:

> yeah, i can't do much for flaky tests...  just flaky
> infrastructure.
>
>
> On Fri, Jul 10, 2020 at 12:41 AM Hyukjin Kwon 
> wrote:
>
>> Couple of flaky tests can happen. It's usual. Seems it got better
>> now at least. I will keep monitoring the builds.
>>
>> 2020년 7월 10일 (금) 오후 4:33, ukby1234 님이 작성:
>>
>>> Looks like Jenkins isn't stable still. My PR fails two times in
>>> a row:
>>>
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console
>>>
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125536/testReport
>>>
>>>
>>>
>>> --
>>> Sent from:
>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>

>>>
>>> --
>>> Shane Knapp
>>> Computer Guy / Voice of Reason
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>

>>
>> --
>> Shane Knapp
>> Computer Guy 

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-10 Thread shane knapp ☠
no, 8 hours is plenty.  things will speed up soon once the backlog of
builds works through  i limited the number of PRB builds to 4 per
worker, and things are looking better.  let's see how we look next week.

On Fri, Jul 10, 2020 at 3:31 PM Frank Yin  wrote:

> Can we also increase the build timeout?
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125617
> This one fails because it times out, not because of test failures.
>
> On Fri, Jul 10, 2020 at 2:16 PM Frank Yin  wrote:
>
>> Yeah, that's what I figured -- those workers are under load. Thanks.
>>
>> On Fri, Jul 10, 2020 at 12:43 PM shane knapp ☠ 
>> wrote:
>>
>>> only 125561, 125562 and 125564 were impacted by -9.
>>>
>>> 125565 exited w/a code of 15 (143 - 128), which means the process was
>>> terminated for unknown reasons.
>>>
>>> 125563 looks like mima failed due to a bunch of errors.
>>>
>>> i just spot checked a bunch of recent failed PRB builds from today and
>>> they all seemed to be legit.
>>>
>>> another thing that might be happening is an overload of PRB builds on
>>> the workers due to the backlog...  the workers are under a LOT of load
>>> right now, and i can put some rate limiting in to see if that helps out.
>>>
>>> shane
>>>
>>> On Fri, Jul 10, 2020 at 11:31 AM Frank Yin  wrote:
>>>
 Like from build number 125565 to 125561, all impacted by kill -9.

 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console

 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125564/console

 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125563/console

 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125562/console

 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125561/console

 On Fri, Jul 10, 2020 at 9:35 AM shane knapp ☠ 
 wrote:

> define "a lot" and provide some links to those builds, please.  there
> are roughly 2000 builds per day, and i can't do more than keep a cursory
> eye on things.
>
> the infrastructure that the tests run on hasn't changed one bit on any
> of the workers, and 'kill -9' could be a timeout, flakiness caused by old
> build processes remaining on the workers after the master went down, or me
> trying to clean things up w/o a reboot.  or, perhaps, something wrong 
> w/the
> infra.  :)
>
> On Fri, Jul 10, 2020 at 9:28 AM Frank Yin  wrote:
>
>> Agree, but I’ve seen a lot of kill by signal 9, assuming that
>> infrastructure?
>>
>> On Fri, Jul 10, 2020 at 8:19 AM shane knapp ☠ 
>> wrote:
>>
>>> yeah, i can't do much for flaky tests...  just flaky infrastructure.
>>>
>>>
>>> On Fri, Jul 10, 2020 at 12:41 AM Hyukjin Kwon 
>>> wrote:
>>>
 Couple of flaky tests can happen. It's usual. Seems it got better
 now at least. I will keep monitoring the builds.

 2020년 7월 10일 (금) 오후 4:33, ukby1234 님이 작성:

> Looks like Jenkins isn't stable still. My PR fails two times in a
> row:
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125536/testReport
>
>
>
> --
> Sent from:
> http://apache-spark-developers-list.1001551.n3.nabble.com/
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
>>>
>>> --
>>> Shane Knapp
>>> Computer Guy / Voice of Reason
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>

>>>
>>> --
>>> Shane Knapp
>>> Computer Guy / Voice of Reason
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>

-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-10 Thread Frank Yin
Yeah, that's what I figured -- those workers are under load. Thanks.

On Fri, Jul 10, 2020 at 12:43 PM shane knapp ☠  wrote:

> only 125561, 125562 and 125564 were impacted by -9.
>
> 125565 exited w/a code of 15 (143 - 128), which means the process was
> terminated for unknown reasons.
>
> 125563 looks like mima failed due to a bunch of errors.
>
> i just spot checked a bunch of recent failed PRB builds from today and
> they all seemed to be legit.
>
> another thing that might be happening is an overload of PRB builds on the
> workers due to the backlog...  the workers are under a LOT of load right
> now, and i can put some rate limiting in to see if that helps out.
>
> shane
>
> On Fri, Jul 10, 2020 at 11:31 AM Frank Yin  wrote:
>
>> Like from build number 125565 to 125561, all impacted by kill -9.
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125564/console
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125563/console
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125562/console
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125561/console
>>
>> On Fri, Jul 10, 2020 at 9:35 AM shane knapp ☠ 
>> wrote:
>>
>>> define "a lot" and provide some links to those builds, please.  there
>>> are roughly 2000 builds per day, and i can't do more than keep a cursory
>>> eye on things.
>>>
>>> the infrastructure that the tests run on hasn't changed one bit on any
>>> of the workers, and 'kill -9' could be a timeout, flakiness caused by old
>>> build processes remaining on the workers after the master went down, or me
>>> trying to clean things up w/o a reboot.  or, perhaps, something wrong w/the
>>> infra.  :)
>>>
>>> On Fri, Jul 10, 2020 at 9:28 AM Frank Yin  wrote:
>>>
 Agree, but I’ve seen a lot of kill by signal 9, assuming that
 infrastructure?

 On Fri, Jul 10, 2020 at 8:19 AM shane knapp ☠ 
 wrote:

> yeah, i can't do much for flaky tests...  just flaky infrastructure.
>
>
> On Fri, Jul 10, 2020 at 12:41 AM Hyukjin Kwon 
> wrote:
>
>> Couple of flaky tests can happen. It's usual. Seems it got better now
>> at least. I will keep monitoring the builds.
>>
>> 2020년 7월 10일 (금) 오후 4:33, ukby1234 님이 작성:
>>
>>> Looks like Jenkins isn't stable still. My PR fails two times in a
>>> row:
>>>
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console
>>>
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125536/testReport
>>>
>>>
>>>
>>> --
>>> Sent from:
>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>

>>>
>>> --
>>> Shane Knapp
>>> Computer Guy / Voice of Reason
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-10 Thread shane knapp ☠
only 125561, 125562 and 125564 were impacted by -9.

125565 exited w/a code of 15 (143 - 128), which means the process was
terminated for unknown reasons.

125563 looks like mima failed due to a bunch of errors.

i just spot checked a bunch of recent failed PRB builds from today and they
all seemed to be legit.

another thing that might be happening is an overload of PRB builds on the
workers due to the backlog...  the workers are under a LOT of load right
now, and i can put some rate limiting in to see if that helps out.

shane

On Fri, Jul 10, 2020 at 11:31 AM Frank Yin  wrote:

> Like from build number 125565 to 125561, all impacted by kill -9.
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125564/console
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125563/console
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125562/console
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125561/console
>
> On Fri, Jul 10, 2020 at 9:35 AM shane knapp ☠  wrote:
>
>> define "a lot" and provide some links to those builds, please.  there are
>> roughly 2000 builds per day, and i can't do more than keep a cursory eye on
>> things.
>>
>> the infrastructure that the tests run on hasn't changed one bit on any of
>> the workers, and 'kill -9' could be a timeout, flakiness caused by old
>> build processes remaining on the workers after the master went down, or me
>> trying to clean things up w/o a reboot.  or, perhaps, something wrong w/the
>> infra.  :)
>>
>> On Fri, Jul 10, 2020 at 9:28 AM Frank Yin  wrote:
>>
>>> Agree, but I’ve seen a lot of kill by signal 9, assuming that
>>> infrastructure?
>>>
>>> On Fri, Jul 10, 2020 at 8:19 AM shane knapp ☠ 
>>> wrote:
>>>
 yeah, i can't do much for flaky tests...  just flaky infrastructure.


 On Fri, Jul 10, 2020 at 12:41 AM Hyukjin Kwon 
 wrote:

> Couple of flaky tests can happen. It's usual. Seems it got better now
> at least. I will keep monitoring the builds.
>
> 2020년 7월 10일 (금) 오후 4:33, ukby1234 님이 작성:
>
>> Looks like Jenkins isn't stable still. My PR fails two times in a row:
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125536/testReport
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>

 --
 Shane Knapp
 Computer Guy / Voice of Reason
 UC Berkeley EECS Research / RISELab Staff Technical Lead
 https://rise.cs.berkeley.edu

>>>
>>
>> --
>> Shane Knapp
>> Computer Guy / Voice of Reason
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>

-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-10 Thread shane knapp ☠
define "a lot" and provide some links to those builds, please.  there are
roughly 2000 builds per day, and i can't do more than keep a cursory eye on
things.

the infrastructure that the tests run on hasn't changed one bit on any of
the workers, and 'kill -9' could be a timeout, flakiness caused by old
build processes remaining on the workers after the master went down, or me
trying to clean things up w/o a reboot.  or, perhaps, something wrong w/the
infra.  :)

On Fri, Jul 10, 2020 at 9:28 AM Frank Yin  wrote:

> Agree, but I’ve seen a lot of kill by signal 9, assuming that
> infrastructure?
>
> On Fri, Jul 10, 2020 at 8:19 AM shane knapp ☠  wrote:
>
>> yeah, i can't do much for flaky tests...  just flaky infrastructure.
>>
>>
>> On Fri, Jul 10, 2020 at 12:41 AM Hyukjin Kwon 
>> wrote:
>>
>>> Couple of flaky tests can happen. It's usual. Seems it got better now at
>>> least. I will keep monitoring the builds.
>>>
>>> 2020년 7월 10일 (금) 오후 4:33, ukby1234 님이 작성:
>>>
 Looks like Jenkins isn't stable still. My PR fails two times in a row:

 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console

 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125536/testReport



 --
 Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


>>
>> --
>> Shane Knapp
>> Computer Guy / Voice of Reason
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>

-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-10 Thread shane knapp ☠
yeah, i can't do much for flaky tests...  just flaky infrastructure.


On Fri, Jul 10, 2020 at 12:41 AM Hyukjin Kwon  wrote:

> Couple of flaky tests can happen. It's usual. Seems it got better now at
> least. I will keep monitoring the builds.
>
> 2020년 7월 10일 (금) 오후 4:33, ukby1234 님이 작성:
>
>> Looks like Jenkins isn't stable still. My PR fails two times in a row:
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125536/testReport
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>

-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-10 Thread Hyukjin Kwon
Couple of flaky tests can happen. It's usual. Seems it got better now at
least. I will keep monitoring the builds.

2020년 7월 10일 (금) 오후 4:33, ukby1234 님이 작성:

> Looks like Jenkins isn't stable still. My PR fails two times in a row:
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125536/testReport
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-10 Thread ukby1234
Looks like Jenkins isn't stable still. My PR fails two times in a row:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125536/testReport



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-09 Thread shane knapp ☠
i'm seeing green PRB builds now, so i feel that we've gotten things
building again!  :)

On Thu, Jul 9, 2020 at 5:33 PM Hyukjin Kwon  wrote:

> Thank you Shane.
>
> 2020년 7월 10일 (금) 오전 2:35, shane knapp ☠ 님이 작성:
>
>> and -06 is back!  i'll keep an eye on things today, but suffice to
>> say on each worker i:
>>
>> 1) rebooted
>> 2) cleaned ~/.ivy2, ~/.m2, and other associated caches
>>
>> we should be g2g!  please reply here if you continue to see weirdness.
>>
>> On Thu, Jul 9, 2020 at 10:08 AM shane knapp ☠ 
>> wrote:
>>
>>> ok, we're back up and building (just waiting for one worker, -06 to
>>> finish cleaning itself up).
>>>
>>> On Thu, Jul 9, 2020 at 9:30 AM shane knapp ☠ 
>>> wrote:
>>>
 this is happening now.

 On Wed, Jul 8, 2020 at 9:07 AM shane knapp ☠ 
 wrote:

> this will be happening tomorrow...  today is Meeting Hell Day[tm].
>
> On Tue, Jul 7, 2020 at 1:59 PM shane knapp ☠ 
> wrote:
>
>> i wasn't able to get to it today, so i'm hoping to squeeze in a quick
>> trip to the colo tomorrow morning.  if not, then first thing thursday.
>>
>> --
>> Shane Knapp
>> Computer Guy / Voice of Reason
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


 --
 Shane Knapp
 Computer Guy / Voice of Reason
 UC Berkeley EECS Research / RISELab Staff Technical Lead
 https://rise.cs.berkeley.edu

>>>
>>>
>>> --
>>> Shane Knapp
>>> Computer Guy / Voice of Reason
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>>
>> --
>> Shane Knapp
>> Computer Guy / Voice of Reason
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>

-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-09 Thread Hyukjin Kwon
Thank you Shane.

2020년 7월 10일 (금) 오전 2:35, shane knapp ☠ 님이 작성:

> and -06 is back!  i'll keep an eye on things today, but suffice to say
> on each worker i:
>
> 1) rebooted
> 2) cleaned ~/.ivy2, ~/.m2, and other associated caches
>
> we should be g2g!  please reply here if you continue to see weirdness.
>
> On Thu, Jul 9, 2020 at 10:08 AM shane knapp ☠  wrote:
>
>> ok, we're back up and building (just waiting for one worker, -06 to
>> finish cleaning itself up).
>>
>> On Thu, Jul 9, 2020 at 9:30 AM shane knapp ☠  wrote:
>>
>>> this is happening now.
>>>
>>> On Wed, Jul 8, 2020 at 9:07 AM shane knapp ☠ 
>>> wrote:
>>>
 this will be happening tomorrow...  today is Meeting Hell Day[tm].

 On Tue, Jul 7, 2020 at 1:59 PM shane knapp ☠ 
 wrote:

> i wasn't able to get to it today, so i'm hoping to squeeze in a quick
> trip to the colo tomorrow morning.  if not, then first thing thursday.
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


 --
 Shane Knapp
 Computer Guy / Voice of Reason
 UC Berkeley EECS Research / RISELab Staff Technical Lead
 https://rise.cs.berkeley.edu

>>>
>>>
>>> --
>>> Shane Knapp
>>> Computer Guy / Voice of Reason
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>>
>> --
>> Shane Knapp
>> Computer Guy / Voice of Reason
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-09 Thread shane knapp ☠
and -06 is back!  i'll keep an eye on things today, but suffice to say
on each worker i:

1) rebooted
2) cleaned ~/.ivy2, ~/.m2, and other associated caches

we should be g2g!  please reply here if you continue to see weirdness.

On Thu, Jul 9, 2020 at 10:08 AM shane knapp ☠  wrote:

> ok, we're back up and building (just waiting for one worker, -06 to finish
> cleaning itself up).
>
> On Thu, Jul 9, 2020 at 9:30 AM shane knapp ☠  wrote:
>
>> this is happening now.
>>
>> On Wed, Jul 8, 2020 at 9:07 AM shane knapp ☠  wrote:
>>
>>> this will be happening tomorrow...  today is Meeting Hell Day[tm].
>>>
>>> On Tue, Jul 7, 2020 at 1:59 PM shane knapp ☠ 
>>> wrote:
>>>
 i wasn't able to get to it today, so i'm hoping to squeeze in a quick
 trip to the colo tomorrow morning.  if not, then first thing thursday.

 --
 Shane Knapp
 Computer Guy / Voice of Reason
 UC Berkeley EECS Research / RISELab Staff Technical Lead
 https://rise.cs.berkeley.edu

>>>
>>>
>>> --
>>> Shane Knapp
>>> Computer Guy / Voice of Reason
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>>
>> --
>> Shane Knapp
>> Computer Guy / Voice of Reason
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-09 Thread shane knapp ☠
ok, we're back up and building (just waiting for one worker, -06 to finish
cleaning itself up).

On Thu, Jul 9, 2020 at 9:30 AM shane knapp ☠  wrote:

> this is happening now.
>
> On Wed, Jul 8, 2020 at 9:07 AM shane knapp ☠  wrote:
>
>> this will be happening tomorrow...  today is Meeting Hell Day[tm].
>>
>> On Tue, Jul 7, 2020 at 1:59 PM shane knapp ☠  wrote:
>>
>>> i wasn't able to get to it today, so i'm hoping to squeeze in a quick
>>> trip to the colo tomorrow morning.  if not, then first thing thursday.
>>>
>>> --
>>> Shane Knapp
>>> Computer Guy / Voice of Reason
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>>
>> --
>> Shane Knapp
>> Computer Guy / Voice of Reason
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-09 Thread Dongjoon Hyun
Thank you always, Shane!

Bests,
Dongjoon.

On Thu, Jul 9, 2020 at 9:30 AM shane knapp ☠  wrote:

> this is happening now.
>
> On Wed, Jul 8, 2020 at 9:07 AM shane knapp ☠  wrote:
>
>> this will be happening tomorrow...  today is Meeting Hell Day[tm].
>>
>> On Tue, Jul 7, 2020 at 1:59 PM shane knapp ☠  wrote:
>>
>>> i wasn't able to get to it today, so i'm hoping to squeeze in a quick
>>> trip to the colo tomorrow morning.  if not, then first thing thursday.
>>>
>>> --
>>> Shane Knapp
>>> Computer Guy / Voice of Reason
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>>
>> --
>> Shane Knapp
>> Computer Guy / Voice of Reason
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-09 Thread shane knapp ☠
this is happening now.

On Wed, Jul 8, 2020 at 9:07 AM shane knapp ☠  wrote:

> this will be happening tomorrow...  today is Meeting Hell Day[tm].
>
> On Tue, Jul 7, 2020 at 1:59 PM shane knapp ☠  wrote:
>
>> i wasn't able to get to it today, so i'm hoping to squeeze in a quick
>> trip to the colo tomorrow morning.  if not, then first thing thursday.
>>
>> --
>> Shane Knapp
>> Computer Guy / Voice of Reason
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-09 Thread Jungtaek Lim
As a side note, I've raised patches for addressing two frequent flaky
tests, CliSuite [1] and HiveSessionImplSuite [2]. Hope this helps to
mitigate the situation.

1. https://github.com/apache/spark/pull/29036
2. https://github.com/apache/spark/pull/29039

On Thu, Jul 9, 2020 at 11:51 AM Hyukjin Kwon  wrote:

> Thanks Shane!
>
> BTW, it's getting serious .. e.g)
> https://github.com/apache/spark/pull/28969.
> The tests could not pass in 7 days .. Hopefully restarting the machines
> will make the current situation better :-)
>
> Separately, I am working on a PR to run the Spark tests in Github Actions.
> We could hopefully use Github Actions and Jenkins together meanwhile.
>
>
> 2020년 7월 9일 (목) 오전 1:07, shane knapp ☠ 님이 작성:
>
>> this will be happening tomorrow...  today is Meeting Hell Day[tm].
>>
>> On Tue, Jul 7, 2020 at 1:59 PM shane knapp ☠  wrote:
>>
>>> i wasn't able to get to it today, so i'm hoping to squeeze in a quick
>>> trip to the colo tomorrow morning.  if not, then first thing thursday.
>>>
>>> --
>>> Shane Knapp
>>> Computer Guy / Voice of Reason
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>>
>> --
>> Shane Knapp
>> Computer Guy / Voice of Reason
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>


Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-08 Thread Hyukjin Kwon
Thanks Shane!

BTW, it's getting serious .. e.g) https://github.com/apache/spark/pull/28969
.
The tests could not pass in 7 days .. Hopefully restarting the machines
will make the current situation better :-)

Separately, I am working on a PR to run the Spark tests in Github Actions.
We could hopefully use Github Actions and Jenkins together meanwhile.


2020년 7월 9일 (목) 오전 1:07, shane knapp ☠ 님이 작성:

> this will be happening tomorrow...  today is Meeting Hell Day[tm].
>
> On Tue, Jul 7, 2020 at 1:59 PM shane knapp ☠  wrote:
>
>> i wasn't able to get to it today, so i'm hoping to squeeze in a quick
>> trip to the colo tomorrow morning.  if not, then first thing thursday.
>>
>> --
>> Shane Knapp
>> Computer Guy / Voice of Reason
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-08 Thread shane knapp ☠
this will be happening tomorrow...  today is Meeting Hell Day[tm].

On Tue, Jul 7, 2020 at 1:59 PM shane knapp ☠  wrote:

> i wasn't able to get to it today, so i'm hoping to squeeze in a quick trip
> to the colo tomorrow morning.  if not, then first thing thursday.
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu