We recently hit a bunch of jenkins failures, due to a full disk.

Just now I removed 172G worth of docker images from build2-deb9build-ansible;
I thought we had the docker cleanup automated by now?

Even after that, build-2 still uses 244G of its root file system, which doesn't
seem right.  most of it is also in the deb9build-ansible lxc:


root@build-2 /var/lib/lxc/deb9build-ansible/rootfs # du -hs * | sort -h
[...]
2.2G    opt
5.8G    usr
8.1G    tmp   (what!
33G     home
153G    var

The tmp/ has many folders like
196M    tmp.u3y02wgBNI
which are all from March to May this year. I will delete them now.

home:
root@build-2 /var/lib/lxc/deb9build-ansible/rootfs/home/osmocom-build # du -hs *
0       bin
19G     jenkins
14G     jenkins_build_artifact_store
1.2G    osmo-ci

Interesting, I wasn't aware of us using the artifact store.
Seems to come from some manual builds between April-October.
Removing.

jenkins workspaces of 19G seems ok.

But osmo-ci of 1.2G!?
That seems to be a manual build of the coverity job -- though the date is
pretty recent, so is our coverity job actually building in
~osmocom-build/osmo-ci instead of in a workspace?


Even after the docker cleanup commands I used from the osmocom.org servers wiki 
page:
docker rm $(docker ps -a -q)
docker rmi $(docker images -q -f dangling=true)
There are still 321 docker images around, most of which are months old.
Not sure why above cleanups don't catch those.
I'm just going to indiscriminately blow all of them away now.

Maybe a good cleanup strategy would be to every week or so automatically wipe
out the entire build slave lxc and re-create it from scratch?

After this, we have on build-2:
Filesystem      Size  Used Avail Use% Mounted on
/dev/md2        438G   83G  333G  20% /


------ host-2

Similar story on host-2 deb9build-ansible lxc: tons of docker images, just 
removed all of them.

But after that we still have
root@host2 ~ # df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/md2        438G  311G  105G  75% /

On host-2 though there are a lot of services running.

root@host2 / # du -hs * | sort -h
[...]
1.2G    usr
59G     var
75G     home
176G    external

[...]
2.7G    gerrit
3.1G    redmine-20170530-before-upgrade-to-3.4.tar
4.3G    mailman
5.7G    openmoko-wiki
7.8G    gitolite
9.9G    openmoko-people
29G     redmine
112G    jenkins

root@host2 /external/jenkins/home/jobs # du -hs * | sort -h
171M    nplab-m3ua-test
198M    master-osmo-pcu
241M    ttcn3-sip-test
251M    osmo-gsm-tester_build-osmo-bsc
262M    ttcn3-ggsn-test
287M    gerrit-osmo-ttcn3-hacks
297M    master-osmo-bsc
322M    master-libosmo-sccp
328M    osmo-gsm-tester_build-osmo-sgsn
355M    master-osmo-mgw
359M    master-libosmo-netif
365M    osmo-gsm-tester_build-osmo-iuh
390M    gerrit-asn1c
392M    gerrit-osmo-bsc
419M    ttcn3-nitb-sysinfo
445M    osmo-gsm-tester_build-osmo-msc
456M    osmo-gsm-tester_manual-build-all
461M    master-libosmocore
461M    TEST_osmocomBB_with_libosmocore_dep
482M    master-osmo-iuh
611M    master-osmo-sgsn
704M    gerrit-osmo-bts
748M    master-osmo-msc
756M    gerrit-osmo-msc
929M    master-openbsc
1.1G    master-osmo-bts
1.1G    ttcn3-hlr-test
1.2G    gerrit-libosmocore
1.2G    ttcn3-mgw-test
1.9G    osmo-gsm-tester-rnd_run
2.0G    ttcn3-sgsn-test
3.0G    ttcn3-msc-test
3.2G    osmo-gsm-tester_run
3.5G    master-asn1c
4.2G    ttcn3-bsc-test-sccplite
4.7G    osmo-gsm-tester_run-rnd
6.2G    osmo-gsm-tester_gerrit
6.3G    osmo-gsm-tester_run-prod
7.5G    osmo-gsm-tester_ttcn3
8.5G    ttcn3-bsc-test
43G     ttcn3-bts-test

It seems we are caching 211 ttcn3-bts-test builds. That seems a tad much.
Indeed 
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bts-test/configure
has "[ ] Discard old builds" (unchecked).
Looking in osmo-ci, the jobs/ttcn3-testsuites.yml has no 'build-discarder' set.
I guess we should add one? Any discard option preferences? A month? A year?
(compare master-builds.yml)


----- admin-2

It seems I can not login there, or at least I don't know the IP address...
  ssh: Could not resolve hostname admin2.osmocom.org: Name or service not known
So I guess I can't check there.

~N

Attachment: signature.asc
Description: PGP signature

Reply via email to