Re: [packman] BuildService API error when branching a package

Stefan Botter Mon, 27 Apr 2020 08:50:42 -0700

Hi Packmans,

I know, it is lame to self-reply, ... but anyhow ...
a tl;dr is at the end of the mail :)


Am Montag, 20. April 2020, 15:21:42 CEST schrieb Stefan Botter:
...
> I hope, I can give more insight in the next few days.

Now is "next few days".

What happened?
The initial problem arose during the evening hours of Apr 1st, when a 
rather unusual blackout hit the part of town, where my servers are 
hosted.
I have a UPS, but it supports for 8-10 minutes only, and the blackout 
lasted 30 minutes. There should be emergency power by means of a diesel 
generator (which by-the-way was scheduled to be replaced the following 
weekend, but this is postponed due to COVID-19), but for unknown reason 
the generator did not kick in. I could restart everything Thursday 
morning.

A secondary problem surfaced, it affected the whole system badly, and I 
have been rather clueless until today.
PMBS runs along my personal VMs as a VMware guest on my lab system (two 
ESXi hosts). The lab is setup according to best practices, with two 
network facing switches, and two separate switches for storage. The 
storage device is a Synology DS620 with 4 1TB SSDs, connected via iSCSI.
Backup is done inside the storage network to a separate DS216+II, and 
until Apr 10th was done by Synology's Advanced Backup for Business, 
which basically does snapshots of the VMs, and copies the changed blocks 
to the backup storage space.
Since the blackout every time backup ran, at least one of the ESXi hosts 
froze or lost network connectivity.
Since Apr 15th PMBS is now backed up by simple means of rsync, there is 
one backup copy created daily. This does not seem to put such a heavy 
strain on the network.
I am still contemplating a versioned backup with rdiff-backup, which I 
use regularly with my other machines, but I am not sure, if my available 
backup space will be sufficient, and how long backup runs take on PMBS. 
So this is on the "maybe-ToDo-list".

Still I did not know the cause of the lock-ups.
By chance I discovered an almost similar behavior with network 
interruptions early last week, when upon a download of a VM image to my 
home system network connectivity was lost. It recovered automagically 
after 10-30 minutes, and was reproducible.

Over the course of the weekend and today I managed to investigate 
further, and found that one of the network add-in cards in one of the 
servers acted strangely under load. I reconfigured the ESXi servers to 
use the lan-on-mainboard (LOM) adapters only, and am now more convinced, 
that the system runs stable again.
I have some spare quad-port cards lying around, and will replace the 
thought-to-be-defective adapters some time in the future, to have the 
lab again conforming to best practices, but for now everything should 
work without frequent interruptions.


As the world-wide COVID-19 calamity and the now emergency-emerging ;) 
changes to schooling environment is putting a heavy demand for immediate 
action by the school's IT, I have been having rather few time to work on 
"personal fun", it took a while longer to resolve the branching issue, 
which caused this thread. The cause of the reported errors were based on 
the frequent unwanted shutdowns, which left some state-recording files 
for sourceserver and schedulers with binary garbage at the end.


I thought it was a good idea to document the events and sort-of-
solution, for you to enjoy, and me to remember, as I will probably 
forget what happened and what I did in a few weeks :)


tl;dr:   everything should work again without frequent interruptions.


Greetings,

Stefan
-- 
Stefan Botter zu Hause
Bremen

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Packman mailing list
[email protected]
http://lists.links2linux.de/cgi-bin/mailman/listinfo/packman

Re: [packman] BuildService API error when branching a package

Antwort per Email an