The RelEng team pulled together a summary of last night’s activity for all
y’all.  A big thank you to Eric Ball, Kevin Sandí, Jessica Gonzalez, Chris
Hoy Poy & Andrew Grimberg for all their help!

 

Work performed

*       Nexus3 snapshot repository was removed (along with all snapshot
images) due to suspected data corruption preventing our ability to complete
the upgrade.
*       Snapshots repository was re-created from scratch and re-connected to
the Nexus rules and privileges
*       Nexus3 upgrade completed successfully

 

Findings

*       Nexus3 docker.snapshot repository was holding very old assets
consuming 7.3TBs of storage filling up almost the whole capacity of the disk
*       Nexus3 upgrade process was being blocked by the docker.snapshot
repository as per a scanning step was getting stuck when reaching that repo
*       We believe an old cleanup script (no longer in use) left behind some
partial artifacts that Nexus was not able to clean up, and that those
artifacts built up over time and caused the problems. By resetting the
repository entirely, these partial artifacts should no longer be a problem.

 

Current Status

*       Nexus3 service has been stable since 
*       Nexus3.onap.org is currently running on the latest version
(v3.37.3-02)

 

ETA on when we will consider the incident / ticket closed if no further
issues.

*       Continue to monitor the service for the next 2 weeks
*       Tickets to be closed at that time: IT-23485
<https://jira.linuxfoundation.org/browse/IT-23485>  IT-23559
<https://jira.linuxfoundation.org/browse/IT-23559> 

 

 

 

Thanks!
-kenny

 

From: [email protected] <[email protected]> On Behalf Of Kenny
Paul via lists.onap.org
Sent: Monday, January 31, 2022 3:57 PM
To: [email protected]; [email protected];
[email protected]
Subject: [onap-tsc] Emergency Nexus Down Time Today

 

My apologies for the short notice, but it was my judgment call based upon
the log4j remediation  being blocked and our ability to take advantage of
the Lunar New Year holiday.

 

As you know we have been trying to debug the ongoing Nexus issues.  As you
also may know we have been unable perform the necessary upgrade that is
required.  

We believe the reason for the upgrade failures to be corruption in the
snapshot repository.  Because of the criticality to the ONAP community on my
direction RelEng will be deleting the snapshot repos and then immediately
attempting to perform the upgrade that keeps failing. 

 

This work will be kicked off at 01:00 UTC, Feb. 1st / 17:00 Pacific, Jan.
31st.

 

During this downtime we will be deleting all snapshots.  After completion of
the downtime you can re-build your snapshots by running "remerge".

 

Thank you for your patience, understanding and support.

 

Thanks!

-kenny


Kenny Paul, Sr. Technical Community Architect

  ONAP Project & LFN Governing Board

  [email protected] <mailto:[email protected]> ,
+1.510.766.5945, US Pacific time zone.

  Find time on my calendar: https://doodle.com/mm/kennypaul/book-a-time 

 





-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#8414): https://lists.onap.org/g/onap-tsc/message/8414
Mute This Topic: https://lists.onap.org/mt/88841873/21656
Group Owner: [email protected]
Unsubscribe: 
https://lists.onap.org/g/onap-tsc/leave/2743226/21656/1412191262/xyzzy 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to