Thx Gabriel for extensive feedback. Actually my ex company added the code to really delete a RBD snap back in 2016 or so, was part of 4.9 if not mistaken. So I expect the code is there, but probably some exception is happening or regression...
Cheers On Sun, Sep 8, 2019, 09:31 Gabriel Beims Bräscher <gabrasc...@gmail.com> wrote: > Thanks for the feedback, Andrija. It looks like delete was not totally > supported then (am I missing something?). I will take a look into this and > open a PR adding propper support for rbd snapshot deletion if necessary. > > Regarding the rollback, I have tested it several times and it worked; > however, I see a weak point on the Ceph rollback implementation. > > It looks like Li Jerry was able to execute the rollback without any > problem. Li, could you please post here the log output: "Attempting to > rollback RBD snapshot [name:%s], [pool:%s], [volumeid:%s], > [snapshotid:%s]"? Andrija will not be able to see that log as the exception > happen prior to it, the only way of you checking those values is via remote > debugging. If you be able to post those values it would help as well on > sorting out what is wrong. > > I am checking the code base, running a few tests, and evaluating the log > that you (Andrija) sent. What I can say for now is that it looks that the > parameter "snapshotRelPath = snapshot.getPath()" [1] is a critical piece of > code that can definitely break the rollback execution flow. My tests had > pointed for a pattern but now I see other possibilities. I will probably > add a few parameters on the rollback/revert command instead of using the > path or review the path life-cycle and different execution flows in order > to keep it safer to be used. > [1] > https://github.com/apache/cloudstack/blob/50fc045f366bd9769eba85c4bc3ecdc0b7035c11/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper > > A few details on the test environments and Ceph/RBD version: > CloudStack, KVM, and Ceph nodes are running with Ubuntu 18.04 > Ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic > (stable) > RADOS Block Devices has snapshot rollback support since Ceph v10.0.2 [ > https://github.com/ceph/ceph/pull/6878] > Rados-java [https://github.com/ceph/rados-java] supports snapshot > rollback since 0.5.0; rados-java 0.5.0 is the version used by CloudStack > 4.13.0.0 > > I will be updating here soon. > > Em dom, 8 de set de 2019 às 12:28, Wido den Hollander <w...@widodh.nl> > escreveu: > >> >> >> On 9/8/19 5:26 AM, Andrija Panic wrote: >> > Maaany release ago, deleting Ceph volume snap, was also only deleting >> it in >> > DB, so the RBD performance become terrible with many tens of (i. e. >> Hourly) >> > snapshots. I'll try to verify this on 4.13 myself, but Wido and the guys >> > will know better... >> >> I pinged Gabriel and he's looking into it. He'll get back to it. >> >> Wido >> >> > >> > I >> > >> > On Sat, Sep 7, 2019, 08:34 li jerry <div...@hotmail.com> wrote: >> > >> >> I found it had nothing to do with storage.cleanup.delay and >> >> storage.cleanup.interval. >> >> >> >> >> >> >> >> The reason is that when DeleteSnapshot Cmd is executed, because the RBD >> >> snapshot does not have Copy to secondary storage, it only changes the >> >> database information, and does not enter the main storage to delete the >> >> snapshot. >> >> >> >> >> >> >> >> >> >> >> >> Log=========================== >> >> >> >> >> >> >> >> 2019-09-07 23:27:00,118 DEBUG [c.c.a.ApiServlet] >> >> (qtp504527234-17:ctx-2e407b61) (logid:445cbea8) ===START=== >> 192.168.254.3 >> >> -- GET >> >> >> command=deleteSnapshot&id=0b50eb7e-4f42-4de7-96c2-1fae137c8c9f&response=json&_=1567869534480 >> >> >> >> 2019-09-07 23:27:00,139 DEBUG [c.c.a.ApiServer] >> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) CIDRs from >> >> which account 'Acct[2f96c108-9408-11e9-a820-0200582b001a-admin]' is >> allowed >> >> to perform API calls: 0.0.0.0/0,::/0 >> >> >> >> 2019-09-07 23:27:00,204 DEBUG [c.c.a.ApiServer] >> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) Retrieved >> >> cmdEventType from job info: SNAPSHOT.DELETE >> >> >> >> 2019-09-07 23:27:00,217 INFO [o.a.c.f.j.i.AsyncJobMonitor] >> >> (API-Job-Executor-2:ctx-f0843047 job-1378) (logid:c34a368a) Add >> job-1378 >> >> into job monitoring >> >> >> >> 2019-09-07 23:27:00,219 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] >> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) submit >> async >> >> job-1378, details: AsyncJobVO {id:1378, userId: 2, accountId: 2, >> >> instanceType: Snapshot, instanceId: 13, cmd: >> >> org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd, >> cmdInfo: >> >> >> {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"1237","id":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","ctxDetails":"{\"interface >> >> >> com.cloud.storage.Snapshot\":\"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f\"}","ctxAccountId":"2","uuid":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","cmdEventType":"SNAPSHOT.DELETE","_":"1567869534480"}, >> >> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, >> >> result: null, initMsid: 2200502468634, completeMsid: null, lastUpdated: >> >> null, lastPolled: null, created: null, removed: null} >> >> >> >> 2019-09-07 23:27:00,220 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] >> >> (API-Job-Executor-2:ctx-f0843047 job-1378) (logid:1cee5097) Executing >> >> AsyncJobVO {id:1378, userId: 2, accountId: 2, instanceType: Snapshot, >> >> instanceId: 13, cmd: >> >> org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd, >> cmdInfo: >> >> >> {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"1237","id":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","ctxDetails":"{\"interface >> >> >> com.cloud.storage.Snapshot\":\"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f\"}","ctxAccountId":"2","uuid":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","cmdEventType":"SNAPSHOT.DELETE","_":"1567869534480"}, >> >> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, >> >> result: null, initMsid: 2200502468634, completeMsid: null, lastUpdated: >> >> null, lastPolled: null, created: null, removed: null} >> >> >> >> 2019-09-07 23:27:00,221 DEBUG [c.c.a.ApiServlet] >> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) ===END=== >> >> 192.168.254.3 -- GET >> >> >> command=deleteSnapshot&id=0b50eb7e-4f42-4de7-96c2-1fae137c8c9f&response=json&_=1567869534480 >> >> >> >> 2019-09-07 23:27:00,305 DEBUG [c.c.a.m.ClusteredAgentAttache] >> >> (AgentManager-Handler-12:null) (logid:) Seq 1-8660140608456756853: >> Routing >> >> from 2199066247173 >> >> >> >> 2019-09-07 23:27:00,305 DEBUG [o.a.c.s.s.XenserverSnapshotStrategy] >> >> (API-Job-Executor-2:ctx-f0843047 job-1378 ctx-f50e25a4) >> (logid:1cee5097) >> >> Can't find snapshot on backup storage, delete it in db >> >> >> >> >> >> >> >> -Jerry >> >> >> >> >> >> >> >> ________________________________ >> >> 发件人: Andrija Panic <andrija.pa...@gmail.com> >> >> 发送时间: Saturday, September 7, 2019 1:07:19 AM >> >> 收件人: users <users@cloudstack.apache.org> >> >> 抄送: d...@cloudstack.apache.org <d...@cloudstack.apache.org> >> >> 主题: Re: 4.13 rbd snapshot delete failed >> >> >> >> storage.cleanup.delay >> >> storage.cleanup.interval >> >> >> >> put both to 60 (seconds) and wait for up to 2min - should be deleted >> just >> >> fine... >> >> >> >> cheers >> >> >> >> On Fri, 6 Sep 2019 at 18:52, li jerry <div...@hotmail.com> wrote: >> >> >> >>> Hello All >> >>> >> >>> When I tested ACS 4.13 KVM + CEPH snapshot, I found that snapshots >> could >> >>> be created and rolled back (using API alone), but deletion could not >> be >> >>> completed. >> >>> >> >>> >> >>> >> >>> After executing the deletion API, the snapshot will disappear from the >> >>> list Snapshots, but the snapshot on CEPH RBD will not be deleted (rbd >> >> snap >> >>> list rbd/ac510428-5d09-4e86-9d34-9dfab3715b7c) >> >>> >> >>> >> >>> >> >>> Is there any way we can completely delete the snapshot? >> >>> >> >>> -Jerry >> >>> >> >>> >> >> >> >> -- >> >> >> >> Andrija Panić >> >> >> > >> >