The RelEng team pulled together a summary of last nights activity for all yall. A big thank you to Eric Ball, Kevin Sandí, Jessica Gonzalez, Chris Hoy Poy & Andrew Grimberg for all their help!
Work performed * Nexus3 snapshot repository was removed (along with all snapshot images) due to suspected data corruption preventing our ability to complete the upgrade. * Snapshots repository was re-created from scratch and re-connected to the Nexus rules and privileges * Nexus3 upgrade completed successfully Findings * Nexus3 docker.snapshot repository was holding very old assets consuming 7.3TBs of storage filling up almost the whole capacity of the disk * Nexus3 upgrade process was being blocked by the docker.snapshot repository as per a scanning step was getting stuck when reaching that repo * We believe an old cleanup script (no longer in use) left behind some partial artifacts that Nexus was not able to clean up, and that those artifacts built up over time and caused the problems. By resetting the repository entirely, these partial artifacts should no longer be a problem. Current Status * Nexus3 service has been stable since * Nexus3.onap.org is currently running on the latest version (v3.37.3-02) ETA on when we will consider the incident / ticket closed if no further issues. * Continue to monitor the service for the next 2 weeks * Tickets to be closed at that time: IT-23485 <https://jira.linuxfoundation.org/browse/IT-23485> IT-23559 <https://jira.linuxfoundation.org/browse/IT-23559> Thanks! -kenny From: [email protected] <[email protected]> On Behalf Of Kenny Paul via lists.onap.org Sent: Monday, January 31, 2022 3:57 PM To: [email protected]; [email protected]; [email protected] Subject: [onap-tsc] Emergency Nexus Down Time Today My apologies for the short notice, but it was my judgment call based upon the log4j remediation being blocked and our ability to take advantage of the Lunar New Year holiday. As you know we have been trying to debug the ongoing Nexus issues. As you also may know we have been unable perform the necessary upgrade that is required. We believe the reason for the upgrade failures to be corruption in the snapshot repository. Because of the criticality to the ONAP community on my direction RelEng will be deleting the snapshot repos and then immediately attempting to perform the upgrade that keeps failing. This work will be kicked off at 01:00 UTC, Feb. 1st / 17:00 Pacific, Jan. 31st. During this downtime we will be deleting all snapshots. After completion of the downtime you can re-build your snapshots by running "remerge". Thank you for your patience, understanding and support. Thanks! -kenny Kenny Paul, Sr. Technical Community Architect ONAP Project & LFN Governing Board [email protected] <mailto:[email protected]> , +1.510.766.5945, US Pacific time zone. Find time on my calendar: https://doodle.com/mm/kennypaul/book-a-time -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#8414): https://lists.onap.org/g/onap-tsc/message/8414 Mute This Topic: https://lists.onap.org/mt/88841873/21656 Group Owner: [email protected] Unsubscribe: https://lists.onap.org/g/onap-tsc/leave/2743226/21656/1412191262/xyzzy [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
