On Fri, Jun 2, 2017 at 4:42 PM, Ben Nemec wrote:
>
>
> On 03/28/2017 05:01 PM, Ben Nemec wrote:
>
>> Final (hopefully) update:
>>
>> All active compute nodes have been rebooted and things seem to be stable
>> again. Jobs are even running a little faster, so I'm thinking
On 03/28/2017 05:01 PM, Ben Nemec wrote:
Final (hopefully) update:
All active compute nodes have been rebooted and things seem to be stable
again. Jobs are even running a little faster, so I'm thinking this had
a detrimental effect on performance too. I've set a reminder for about
two
Final (hopefully) update:
All active compute nodes have been rebooted and things seem to be stable
again. Jobs are even running a little faster, so I'm thinking this had
a detrimental effect on performance too. I've set a reminder for about
two months from now to reboot again if we're still
To follow-up on this, we've continued to hit this issue on other compute
nodes. Not surprising, of course. They've all been up for about the
same period of time and have had largely even workloads.
It has caused problems though because it is cropping up faster than I
can respond (it takes a
On 22 March 2017 at 22:36, Ben Nemec wrote:
> Hi all (owl?),
>
> You may have missed it in all the ci excitement the past couple of days, but
> we had a partial outage of rh1 last night. It turns out the OVS port issue
> Derek discussed in
>