Thx Gabriel - I've commented on the PR - needs some more love - but we're almost there!
On Thu, 3 Oct 2019 at 20:46, Gabriel Beims Bräscher <gabrasc...@gmail.com> wrote: > Hello folks, > > Just pinging that I have created PR > https://github.com/apache/cloudstack/pull/3615 addressing the snapshot > deletion issue #3586 (https://github.com/apache/cloudstack/issues/3586). > Please, feel free to test and review. > > Regards, > Gabriel. > > Em seg, 9 de set de 2019 às 12:08, Gabriel Beims Bräscher < > gabrasc...@gmail.com> escreveu: > > > Thanks for the feedback Andrija and Andrei. > > > > I have opened issue #3590 for the snapshot rollback issue raised by > > Andrija. > > I will be investigating both issues: > > - RBD snapshot Revert #3590 ( > > https://github.com/apache/cloudstack/issues/3590) > > - RBD snapshot deletion #3586 ( > > https://github.com/apache/cloudstack/issues/3586) > > > > Cheers, > > Gabriel > > > > Em seg, 9 de set de 2019 às 09:41, Andrei Mikhailovsky < > and...@arhont.com> > > escreveu: > > > >> A quick feedback from my side. I've never had a properly working delete > >> snapshot with ceph. Every week or so I have to manually delete all ceph > >> snapshots. However, the NFS secondary storage snapshots are deleted just > >> fine. I've been using CloudStack for 5+ years and it was always the > case. I > >> am currently running 4.11.2 with ceph 13.2.6-1xenial. > >> > >> Andrei > >> > >> ----- Original Message ----- > >> > From: "Andrija Panic" <andrija.pa...@gmail.com> > >> > To: "Gabriel Beims Bräscher" <gabrasc...@gmail.com> > >> > Cc: "users" <users@cloudstack.apache.org>, "dev" < > >> d...@cloudstack.apache.org> > >> > Sent: Sunday, 8 September, 2019 19:17:59 > >> > Subject: Re: 4.13 rbd snapshot delete failed > >> > >> > Thx Gabriel for extensive feedback. > >> > Actually my ex company added the code to really delete a RBD snap back > >> in > >> > 2016 or so, was part of 4.9 if not mistaken. So I expect the code is > >> there, > >> > but probably some exception is happening or regression... > >> > > >> > Cheers > >> > > >> > On Sun, Sep 8, 2019, 09:31 Gabriel Beims Bräscher < > gabrasc...@gmail.com > >> > > >> > wrote: > >> > > >> >> Thanks for the feedback, Andrija. It looks like delete was not > totally > >> >> supported then (am I missing something?). I will take a look into > this > >> and > >> >> open a PR adding propper support for rbd snapshot deletion if > >> necessary. > >> >> > >> >> Regarding the rollback, I have tested it several times and it worked; > >> >> however, I see a weak point on the Ceph rollback implementation. > >> >> > >> >> It looks like Li Jerry was able to execute the rollback without any > >> >> problem. Li, could you please post here the log output: "Attempting > to > >> >> rollback RBD snapshot [name:%s], [pool:%s], [volumeid:%s], > >> >> [snapshotid:%s]"? Andrija will not be able to see that log as the > >> exception > >> >> happen prior to it, the only way of you checking those values is via > >> remote > >> >> debugging. If you be able to post those values it would help as well > on > >> >> sorting out what is wrong. > >> >> > >> >> I am checking the code base, running a few tests, and evaluating the > >> log > >> >> that you (Andrija) sent. What I can say for now is that it looks that > >> the > >> >> parameter "snapshotRelPath = snapshot.getPath()" [1] is a critical > >> piece of > >> >> code that can definitely break the rollback execution flow. My tests > >> had > >> >> pointed for a pattern but now I see other possibilities. I will > >> probably > >> >> add a few parameters on the rollback/revert command instead of using > >> the > >> >> path or review the path life-cycle and different execution flows in > >> order > >> >> to keep it safer to be used. > >> >> [1] > >> >> > >> > https://github.com/apache/cloudstack/blob/50fc045f366bd9769eba85c4bc3ecdc0b7035c11/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper > >> >> > >> >> A few details on the test environments and Ceph/RBD version: > >> >> CloudStack, KVM, and Ceph nodes are running with Ubuntu 18.04 > >> >> Ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic > >> >> (stable) > >> >> RADOS Block Devices has snapshot rollback support since Ceph v10.0.2 > [ > >> >> https://github.com/ceph/ceph/pull/6878] > >> >> Rados-java [https://github.com/ceph/rados-java] supports snapshot > >> >> rollback since 0.5.0; rados-java 0.5.0 is the version used by > >> CloudStack > >> >> 4.13.0.0 > >> >> > >> >> I will be updating here soon. > >> >> > >> >> Em dom, 8 de set de 2019 às 12:28, Wido den Hollander < > w...@widodh.nl> > >> >> escreveu: > >> >> > >> >>> > >> >>> > >> >>> On 9/8/19 5:26 AM, Andrija Panic wrote: > >> >>> > Maaany release ago, deleting Ceph volume snap, was also only > >> deleting > >> >>> it in > >> >>> > DB, so the RBD performance become terrible with many tens of (i. > e. > >> >>> Hourly) > >> >>> > snapshots. I'll try to verify this on 4.13 myself, but Wido and > the > >> guys > >> >>> > will know better... > >> >>> > >> >>> I pinged Gabriel and he's looking into it. He'll get back to it. > >> >>> > >> >>> Wido > >> >>> > >> >>> > > >> >>> > I > >> >>> > > >> >>> > On Sat, Sep 7, 2019, 08:34 li jerry <div...@hotmail.com> wrote: > >> >>> > > >> >>> >> I found it had nothing to do with storage.cleanup.delay and > >> >>> >> storage.cleanup.interval. > >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> The reason is that when DeleteSnapshot Cmd is executed, because > >> the RBD > >> >>> >> snapshot does not have Copy to secondary storage, it only changes > >> the > >> >>> >> database information, and does not enter the main storage to > >> delete the > >> >>> >> snapshot. > >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> Log=========================== > >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> 2019-09-07 23:27:00,118 DEBUG [c.c.a.ApiServlet] > >> >>> >> (qtp504527234-17:ctx-2e407b61) (logid:445cbea8) ===START=== > >> >>> 192.168.254.3 > >> >>> >> -- GET > >> >>> >> > >> >>> > >> > command=deleteSnapshot&id=0b50eb7e-4f42-4de7-96c2-1fae137c8c9f&response=json&_=1567869534480 > >> >>> >> > >> >>> >> 2019-09-07 23:27:00,139 DEBUG [c.c.a.ApiServer] > >> >>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) > CIDRs > >> from > >> >>> >> which account 'Acct[2f96c108-9408-11e9-a820-0200582b001a-admin]' > is > >> >>> allowed > >> >>> >> to perform API calls: 0.0.0.0/0,::/0 > >> >>> >> > >> >>> >> 2019-09-07 23:27:00,204 DEBUG [c.c.a.ApiServer] > >> >>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) > >> Retrieved > >> >>> >> cmdEventType from job info: SNAPSHOT.DELETE > >> >>> >> > >> >>> >> 2019-09-07 23:27:00,217 INFO [o.a.c.f.j.i.AsyncJobMonitor] > >> >>> >> (API-Job-Executor-2:ctx-f0843047 job-1378) (logid:c34a368a) Add > >> >>> job-1378 > >> >>> >> into job monitoring > >> >>> >> > >> >>> >> 2019-09-07 23:27:00,219 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > >> >>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) > submit > >> >>> async > >> >>> >> job-1378, details: AsyncJobVO {id:1378, userId: 2, accountId: 2, > >> >>> >> instanceType: Snapshot, instanceId: 13, cmd: > >> >>> >> > org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd, > >> >>> cmdInfo: > >> >>> >> > >> >>> > >> > {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"1237","id":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","ctxDetails":"{\"interface > >> >>> >> > >> >>> > >> > com.cloud.storage.Snapshot\":\"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f\"}","ctxAccountId":"2","uuid":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","cmdEventType":"SNAPSHOT.DELETE","_":"1567869534480"}, > >> >>> >> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: > >> 0, > >> >>> >> result: null, initMsid: 2200502468634, completeMsid: null, > >> lastUpdated: > >> >>> >> null, lastPolled: null, created: null, removed: null} > >> >>> >> > >> >>> >> 2019-09-07 23:27:00,220 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > >> >>> >> (API-Job-Executor-2:ctx-f0843047 job-1378) (logid:1cee5097) > >> Executing > >> >>> >> AsyncJobVO {id:1378, userId: 2, accountId: 2, instanceType: > >> Snapshot, > >> >>> >> instanceId: 13, cmd: > >> >>> >> > org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd, > >> >>> cmdInfo: > >> >>> >> > >> >>> > >> > {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"1237","id":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","ctxDetails":"{\"interface > >> >>> >> > >> >>> > >> > com.cloud.storage.Snapshot\":\"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f\"}","ctxAccountId":"2","uuid":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","cmdEventType":"SNAPSHOT.DELETE","_":"1567869534480"}, > >> >>> >> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: > >> 0, > >> >>> >> result: null, initMsid: 2200502468634, completeMsid: null, > >> lastUpdated: > >> >>> >> null, lastPolled: null, created: null, removed: null} > >> >>> >> > >> >>> >> 2019-09-07 23:27:00,221 DEBUG [c.c.a.ApiServlet] > >> >>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) > >> ===END=== > >> >>> >> 192.168.254.3 -- GET > >> >>> >> > >> >>> > >> > command=deleteSnapshot&id=0b50eb7e-4f42-4de7-96c2-1fae137c8c9f&response=json&_=1567869534480 > >> >>> >> > >> >>> >> 2019-09-07 23:27:00,305 DEBUG [c.c.a.m.ClusteredAgentAttache] > >> >>> >> (AgentManager-Handler-12:null) (logid:) Seq > 1-8660140608456756853: > >> >>> Routing > >> >>> >> from 2199066247173 > >> >>> >> > >> >>> >> 2019-09-07 23:27:00,305 DEBUG > [o.a.c.s.s.XenserverSnapshotStrategy] > >> >>> >> (API-Job-Executor-2:ctx-f0843047 job-1378 ctx-f50e25a4) > >> >>> (logid:1cee5097) > >> >>> >> Can't find snapshot on backup storage, delete it in db > >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> -Jerry > >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> ________________________________ > >> >>> >> 发件人: Andrija Panic <andrija.pa...@gmail.com> > >> >>> >> 发送时间: Saturday, September 7, 2019 1:07:19 AM > >> >>> >> 收件人: users <users@cloudstack.apache.org> > >> >>> >> 抄送: d...@cloudstack.apache.org <d...@cloudstack.apache.org> > >> >>> >> 主题: Re: 4.13 rbd snapshot delete failed > >> >>> >> > >> >>> >> storage.cleanup.delay > >> >>> >> storage.cleanup.interval > >> >>> >> > >> >>> >> put both to 60 (seconds) and wait for up to 2min - should be > >> deleted > >> >>> just > >> >>> >> fine... > >> >>> >> > >> >>> >> cheers > >> >>> >> > >> >>> >> On Fri, 6 Sep 2019 at 18:52, li jerry <div...@hotmail.com> > wrote: > >> >>> >> > >> >>> >>> Hello All > >> >>> >>> > >> >>> >>> When I tested ACS 4.13 KVM + CEPH snapshot, I found that > snapshots > >> >>> could > >> >>> >>> be created and rolled back (using API alone), but deletion could > >> not > >> >>> be > >> >>> >>> completed. > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> After executing the deletion API, the snapshot will disappear > >> from the > >> >>> >>> list Snapshots, but the snapshot on CEPH RBD will not be deleted > >> (rbd > >> >>> >> snap > >> >>> >>> list rbd/ac510428-5d09-4e86-9d34-9dfab3715b7c) > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> Is there any way we can completely delete the snapshot? > >> >>> >>> > >> >>> >>> -Jerry > >> >>> >>> > >> >>> >>> > >> >>> >> > >> >>> >> -- > >> >>> >> > >> >>> >> Andrija Panić > >> >>> >> > >> >>> > > >> >>> > >> > > > -- Andrija Panić