Re: 4.13 rbd snapshot delete failed

2019-10-03 Thread Andrija Panic
Thx Gabriel - I've commented on the PR - needs some more love - but we're
almost there!

On Thu, 3 Oct 2019 at 20:46, Gabriel Beims Bräscher 
wrote:

> Hello folks,
>
> Just pinging that I have created PR
> https://github.com/apache/cloudstack/pull/3615 addressing the snapshot
> deletion issue #3586 (https://github.com/apache/cloudstack/issues/3586).
> Please, feel free to test and review.
>
> Regards,
> Gabriel.
>
> Em seg, 9 de set de 2019 às 12:08, Gabriel Beims Bräscher <
> gabrasc...@gmail.com> escreveu:
>
> > Thanks for the feedback Andrija and Andrei.
> >
> > I have opened issue #3590 for the snapshot rollback issue raised by
> > Andrija.
> > I will be investigating both issues:
> > - RBD snapshot Revert #3590 (
> > https://github.com/apache/cloudstack/issues/3590)
> > - RBD snapshot deletion #3586 (
> > https://github.com/apache/cloudstack/issues/3586)
> >
> > Cheers,
> > Gabriel
> >
> > Em seg, 9 de set de 2019 às 09:41, Andrei Mikhailovsky <
> and...@arhont.com>
> > escreveu:
> >
> >> A quick feedback from my side. I've never had a properly working delete
> >> snapshot with ceph. Every week or so I have to manually delete all ceph
> >> snapshots. However, the NFS secondary storage snapshots are deleted just
> >> fine. I've been using CloudStack for 5+ years and it was always the
> case. I
> >> am currently running 4.11.2 with ceph 13.2.6-1xenial.
> >>
> >> Andrei
> >>
> >> - Original Message -
> >> > From: "Andrija Panic" 
> >> > To: "Gabriel Beims Bräscher" 
> >> > Cc: "users" , "dev" <
> >> d...@cloudstack.apache.org>
> >> > Sent: Sunday, 8 September, 2019 19:17:59
> >> > Subject: Re: 4.13 rbd snapshot delete failed
> >>
> >> > Thx Gabriel for extensive feedback.
> >> > Actually my ex company added the code to really delete a RBD snap back
> >> in
> >> > 2016 or so, was part of 4.9 if not mistaken. So I expect the code is
> >> there,
> >> > but probably some exception is happening or regression...
> >> >
> >> > Cheers
> >> >
> >> > On Sun, Sep 8, 2019, 09:31 Gabriel Beims Bräscher <
> gabrasc...@gmail.com
> >> >
> >> > wrote:
> >> >
> >> >> Thanks for the feedback, Andrija. It looks like delete was not
> totally
> >> >> supported then (am I missing something?). I will take a look into
> this
> >> and
> >> >> open a PR adding propper support for rbd snapshot deletion if
> >> necessary.
> >> >>
> >> >> Regarding the rollback, I have tested it several times and it worked;
> >> >> however, I see a weak point on the Ceph rollback implementation.
> >> >>
> >> >> It looks like Li Jerry was able to execute the rollback without any
> >> >> problem. Li, could you please post here  the log output: "Attempting
> to
> >> >> rollback RBD snapshot [name:%s], [pool:%s], [volumeid:%s],
> >> >> [snapshotid:%s]"? Andrija will not be able to see that log as the
> >> exception
> >> >> happen prior to it, the only way of you checking those values is via
> >> remote
> >> >> debugging. If you be able to post those values it would help as well
> on
> >> >> sorting out what is wrong.
> >> >>
> >> >> I am checking the code base, running a few tests, and evaluating the
> >> log
> >> >> that you (Andrija) sent. What I can say for now is that it looks that
> >> the
> >> >> parameter "snapshotRelPath = snapshot.getPath()" [1] is a critical
> >> piece of
> >> >> code that can definitely break the rollback execution flow. My tests
> >> had
> >> >> pointed for a pattern but now I see other possibilities. I will
> >> probably
> >> >> add a few parameters on the rollback/revert command instead of using
> >> the
> >> >> path or review the path life-cycle and different execution flows in
> >> order
> >> >> to keep it safer to be used.
> >> >> [1]
> >> >>
> >>
> https://github.com/apache/cloudstack/blob/50fc045f366bd9769eba85c4bc3ecdc0b7035c11/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper
> >> >>
> >> >> A few details on the test environments and Ce

Re: 4.13 rbd snapshot delete failed

2019-10-03 Thread Gabriel Beims Bräscher
Hello folks,

Just pinging that I have created PR
https://github.com/apache/cloudstack/pull/3615 addressing the snapshot
deletion issue #3586 (https://github.com/apache/cloudstack/issues/3586).
Please, feel free to test and review.

Regards,
Gabriel.

Em seg, 9 de set de 2019 às 12:08, Gabriel Beims Bräscher <
gabrasc...@gmail.com> escreveu:

> Thanks for the feedback Andrija and Andrei.
>
> I have opened issue #3590 for the snapshot rollback issue raised by
> Andrija.
> I will be investigating both issues:
> - RBD snapshot Revert #3590 (
> https://github.com/apache/cloudstack/issues/3590)
> - RBD snapshot deletion #3586 (
> https://github.com/apache/cloudstack/issues/3586)
>
> Cheers,
> Gabriel
>
> Em seg, 9 de set de 2019 às 09:41, Andrei Mikhailovsky 
> escreveu:
>
>> A quick feedback from my side. I've never had a properly working delete
>> snapshot with ceph. Every week or so I have to manually delete all ceph
>> snapshots. However, the NFS secondary storage snapshots are deleted just
>> fine. I've been using CloudStack for 5+ years and it was always the case. I
>> am currently running 4.11.2 with ceph 13.2.6-1xenial.
>>
>> Andrei
>>
>> - Original Message -
>> > From: "Andrija Panic" 
>> > To: "Gabriel Beims Bräscher" 
>> > Cc: "users" , "dev" <
>> d...@cloudstack.apache.org>
>> > Sent: Sunday, 8 September, 2019 19:17:59
>> > Subject: Re: 4.13 rbd snapshot delete failed
>>
>> > Thx Gabriel for extensive feedback.
>> > Actually my ex company added the code to really delete a RBD snap back
>> in
>> > 2016 or so, was part of 4.9 if not mistaken. So I expect the code is
>> there,
>> > but probably some exception is happening or regression...
>> >
>> > Cheers
>> >
>> > On Sun, Sep 8, 2019, 09:31 Gabriel Beims Bräscher > >
>> > wrote:
>> >
>> >> Thanks for the feedback, Andrija. It looks like delete was not totally
>> >> supported then (am I missing something?). I will take a look into this
>> and
>> >> open a PR adding propper support for rbd snapshot deletion if
>> necessary.
>> >>
>> >> Regarding the rollback, I have tested it several times and it worked;
>> >> however, I see a weak point on the Ceph rollback implementation.
>> >>
>> >> It looks like Li Jerry was able to execute the rollback without any
>> >> problem. Li, could you please post here  the log output: "Attempting to
>> >> rollback RBD snapshot [name:%s], [pool:%s], [volumeid:%s],
>> >> [snapshotid:%s]"? Andrija will not be able to see that log as the
>> exception
>> >> happen prior to it, the only way of you checking those values is via
>> remote
>> >> debugging. If you be able to post those values it would help as well on
>> >> sorting out what is wrong.
>> >>
>> >> I am checking the code base, running a few tests, and evaluating the
>> log
>> >> that you (Andrija) sent. What I can say for now is that it looks that
>> the
>> >> parameter "snapshotRelPath = snapshot.getPath()" [1] is a critical
>> piece of
>> >> code that can definitely break the rollback execution flow. My tests
>> had
>> >> pointed for a pattern but now I see other possibilities. I will
>> probably
>> >> add a few parameters on the rollback/revert command instead of using
>> the
>> >> path or review the path life-cycle and different execution flows in
>> order
>> >> to keep it safer to be used.
>> >> [1]
>> >>
>> https://github.com/apache/cloudstack/blob/50fc045f366bd9769eba85c4bc3ecdc0b7035c11/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper
>> >>
>> >> A few details on the test environments and Ceph/RBD version:
>> >> CloudStack, KVM, and Ceph nodes are running with Ubuntu 18.04
>> >> Ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic
>> >> (stable)
>> >> RADOS Block Devices has snapshot rollback support since Ceph v10.0.2 [
>> >> https://github.com/ceph/ceph/pull/6878]
>> >> Rados-java [https://github.com/ceph/rados-java] supports snapshot
>> >> rollback since 0.5.0; rados-java 0.5.0 is the version used by
>> CloudStack
>> >> 4.13.0.0
>> >>
>> >> I will be updating here soon.
>> >>
>> >> Em dom, 8 de set de 2019 às 12:28, Wido den Hol

Re: 4.13 rbd snapshot delete failed

2019-09-09 Thread Gabriel Beims Bräscher
Thanks for the feedback Andrija and Andrei.

I have opened issue #3590 for the snapshot rollback issue raised by
Andrija.
I will be investigating both issues:
- RBD snapshot Revert #3590 (
https://github.com/apache/cloudstack/issues/3590)
- RBD snapshot deletion #3586 (
https://github.com/apache/cloudstack/issues/3586)

Cheers,
Gabriel

Em seg, 9 de set de 2019 às 09:41, Andrei Mikhailovsky 
escreveu:

> A quick feedback from my side. I've never had a properly working delete
> snapshot with ceph. Every week or so I have to manually delete all ceph
> snapshots. However, the NFS secondary storage snapshots are deleted just
> fine. I've been using CloudStack for 5+ years and it was always the case. I
> am currently running 4.11.2 with ceph 13.2.6-1xenial.
>
> Andrei
>
> - Original Message -
> > From: "Andrija Panic" 
> > To: "Gabriel Beims Bräscher" 
> > Cc: "users" , "dev" <
> d...@cloudstack.apache.org>
> > Sent: Sunday, 8 September, 2019 19:17:59
> > Subject: Re: 4.13 rbd snapshot delete failed
>
> > Thx Gabriel for extensive feedback.
> > Actually my ex company added the code to really delete a RBD snap back in
> > 2016 or so, was part of 4.9 if not mistaken. So I expect the code is
> there,
> > but probably some exception is happening or regression...
> >
> > Cheers
> >
> > On Sun, Sep 8, 2019, 09:31 Gabriel Beims Bräscher 
> > wrote:
> >
> >> Thanks for the feedback, Andrija. It looks like delete was not totally
> >> supported then (am I missing something?). I will take a look into this
> and
> >> open a PR adding propper support for rbd snapshot deletion if necessary.
> >>
> >> Regarding the rollback, I have tested it several times and it worked;
> >> however, I see a weak point on the Ceph rollback implementation.
> >>
> >> It looks like Li Jerry was able to execute the rollback without any
> >> problem. Li, could you please post here  the log output: "Attempting to
> >> rollback RBD snapshot [name:%s], [pool:%s], [volumeid:%s],
> >> [snapshotid:%s]"? Andrija will not be able to see that log as the
> exception
> >> happen prior to it, the only way of you checking those values is via
> remote
> >> debugging. If you be able to post those values it would help as well on
> >> sorting out what is wrong.
> >>
> >> I am checking the code base, running a few tests, and evaluating the log
> >> that you (Andrija) sent. What I can say for now is that it looks that
> the
> >> parameter "snapshotRelPath = snapshot.getPath()" [1] is a critical
> piece of
> >> code that can definitely break the rollback execution flow. My tests had
> >> pointed for a pattern but now I see other possibilities. I will probably
> >> add a few parameters on the rollback/revert command instead of using the
> >> path or review the path life-cycle and different execution flows in
> order
> >> to keep it safer to be used.
> >> [1]
> >>
> https://github.com/apache/cloudstack/blob/50fc045f366bd9769eba85c4bc3ecdc0b7035c11/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper
> >>
> >> A few details on the test environments and Ceph/RBD version:
> >> CloudStack, KVM, and Ceph nodes are running with Ubuntu 18.04
> >> Ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic
> >> (stable)
> >> RADOS Block Devices has snapshot rollback support since Ceph v10.0.2 [
> >> https://github.com/ceph/ceph/pull/6878]
> >> Rados-java [https://github.com/ceph/rados-java] supports snapshot
> >> rollback since 0.5.0; rados-java 0.5.0 is the version used by CloudStack
> >> 4.13.0.0
> >>
> >> I will be updating here soon.
> >>
> >> Em dom, 8 de set de 2019 às 12:28, Wido den Hollander 
> >> escreveu:
> >>
> >>>
> >>>
> >>> On 9/8/19 5:26 AM, Andrija Panic wrote:
> >>> > Maaany release ago, deleting Ceph volume snap, was also only deleting
> >>> it in
> >>> > DB, so the RBD performance become terrible with many tens of (i. e.
> >>> Hourly)
> >>> > snapshots. I'll try to verify this on 4.13 myself, but Wido and the
> guys
> >>> > will know better...
> >>>
> >>> I pinged Gabriel and he's looking into it. He'll get back to it.
> >>>
> >>> Wido
> >>>
> >>> >
> >>> > I
> >>> >
> >>> > On Sat, Sep 7, 2019, 08:34 li

Re: 4.13 rbd snapshot delete failed

2019-09-09 Thread Andrei Mikhailovsky
A quick feedback from my side. I've never had a properly working delete 
snapshot with ceph. Every week or so I have to manually delete all ceph 
snapshots. However, the NFS secondary storage snapshots are deleted just fine. 
I've been using CloudStack for 5+ years and it was always the case. I am 
currently running 4.11.2 with ceph 13.2.6-1xenial.

Andrei

- Original Message -
> From: "Andrija Panic" 
> To: "Gabriel Beims Bräscher" 
> Cc: "users" , "dev" 
> Sent: Sunday, 8 September, 2019 19:17:59
> Subject: Re: 4.13 rbd snapshot delete failed

> Thx Gabriel for extensive feedback.
> Actually my ex company added the code to really delete a RBD snap back in
> 2016 or so, was part of 4.9 if not mistaken. So I expect the code is there,
> but probably some exception is happening or regression...
> 
> Cheers
> 
> On Sun, Sep 8, 2019, 09:31 Gabriel Beims Bräscher 
> wrote:
> 
>> Thanks for the feedback, Andrija. It looks like delete was not totally
>> supported then (am I missing something?). I will take a look into this and
>> open a PR adding propper support for rbd snapshot deletion if necessary.
>>
>> Regarding the rollback, I have tested it several times and it worked;
>> however, I see a weak point on the Ceph rollback implementation.
>>
>> It looks like Li Jerry was able to execute the rollback without any
>> problem. Li, could you please post here  the log output: "Attempting to
>> rollback RBD snapshot [name:%s], [pool:%s], [volumeid:%s],
>> [snapshotid:%s]"? Andrija will not be able to see that log as the exception
>> happen prior to it, the only way of you checking those values is via remote
>> debugging. If you be able to post those values it would help as well on
>> sorting out what is wrong.
>>
>> I am checking the code base, running a few tests, and evaluating the log
>> that you (Andrija) sent. What I can say for now is that it looks that the
>> parameter "snapshotRelPath = snapshot.getPath()" [1] is a critical piece of
>> code that can definitely break the rollback execution flow. My tests had
>> pointed for a pattern but now I see other possibilities. I will probably
>> add a few parameters on the rollback/revert command instead of using the
>> path or review the path life-cycle and different execution flows in order
>> to keep it safer to be used.
>> [1]
>> https://github.com/apache/cloudstack/blob/50fc045f366bd9769eba85c4bc3ecdc0b7035c11/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper
>>
>> A few details on the test environments and Ceph/RBD version:
>> CloudStack, KVM, and Ceph nodes are running with Ubuntu 18.04
>> Ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic
>> (stable)
>> RADOS Block Devices has snapshot rollback support since Ceph v10.0.2 [
>> https://github.com/ceph/ceph/pull/6878]
>> Rados-java [https://github.com/ceph/rados-java] supports snapshot
>> rollback since 0.5.0; rados-java 0.5.0 is the version used by CloudStack
>> 4.13.0.0
>>
>> I will be updating here soon.
>>
>> Em dom, 8 de set de 2019 às 12:28, Wido den Hollander 
>> escreveu:
>>
>>>
>>>
>>> On 9/8/19 5:26 AM, Andrija Panic wrote:
>>> > Maaany release ago, deleting Ceph volume snap, was also only deleting
>>> it in
>>> > DB, so the RBD performance become terrible with many tens of (i. e.
>>> Hourly)
>>> > snapshots. I'll try to verify this on 4.13 myself, but Wido and the guys
>>> > will know better...
>>>
>>> I pinged Gabriel and he's looking into it. He'll get back to it.
>>>
>>> Wido
>>>
>>> >
>>> > I
>>> >
>>> > On Sat, Sep 7, 2019, 08:34 li jerry  wrote:
>>> >
>>> >> I found it had nothing to do with  storage.cleanup.delay and
>>> >> storage.cleanup.interval.
>>> >>
>>> >>
>>> >>
>>> >> The reason is that when DeleteSnapshot Cmd is executed, because the RBD
>>> >> snapshot does not have Copy to secondary storage, it only changes the
>>> >> database information, and does not enter the main storage to delete the
>>> >> snapshot.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> Log===
>>> >>
>>> >>
>>> >>
>>> >> 2019-09-07 23:27:00,118 DEBUG [c.c.a.ApiServlet]
>>> >> (qtp504527234-17:ctx-2e407

Re: 4.13 rbd snapshot delete failed

2019-09-08 Thread Andrija Panic
onitoring
>> >>
>> >> 2019-09-07 23:27:00,219 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) submit
>> async
>> >> job-1378, details: AsyncJobVO {id:1378, userId: 2, accountId: 2,
>> >> instanceType: Snapshot, instanceId: 13, cmd:
>> >> org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd,
>> cmdInfo:
>> >>
>> {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"1237","id":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","ctxDetails":"{\"interface
>> >>
>> com.cloud.storage.Snapshot\":\"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f\"}","ctxAccountId":"2","uuid":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","cmdEventType":"SNAPSHOT.DELETE","_":"1567869534480"},
>> >> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
>> >> result: null, initMsid: 2200502468634, completeMsid: null, lastUpdated:
>> >> null, lastPolled: null, created: null, removed: null}
>> >>
>> >> 2019-09-07 23:27:00,220 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> >> (API-Job-Executor-2:ctx-f0843047 job-1378) (logid:1cee5097) Executing
>> >> AsyncJobVO {id:1378, userId: 2, accountId: 2, instanceType: Snapshot,
>> >> instanceId: 13, cmd:
>> >> org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd,
>> cmdInfo:
>> >>
>> {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"1237","id":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","ctxDetails":"{\"interface
>> >>
>> com.cloud.storage.Snapshot\":\"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f\"}","ctxAccountId":"2","uuid":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","cmdEventType":"SNAPSHOT.DELETE","_":"1567869534480"},
>> >> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
>> >> result: null, initMsid: 2200502468634, completeMsid: null, lastUpdated:
>> >> null, lastPolled: null, created: null, removed: null}
>> >>
>> >> 2019-09-07 23:27:00,221 DEBUG [c.c.a.ApiServlet]
>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) ===END===
>> >> 192.168.254.3 -- GET
>> >>
>> command=deleteSnapshot=0b50eb7e-4f42-4de7-96c2-1fae137c8c9f=json&_=1567869534480
>> >>
>> >> 2019-09-07 23:27:00,305 DEBUG [c.c.a.m.ClusteredAgentAttache]
>> >> (AgentManager-Handler-12:null) (logid:) Seq 1-8660140608456756853:
>> Routing
>> >> from 2199066247173
>> >>
>> >> 2019-09-07 23:27:00,305 DEBUG [o.a.c.s.s.XenserverSnapshotStrategy]
>> >> (API-Job-Executor-2:ctx-f0843047 job-1378 ctx-f50e25a4)
>> (logid:1cee5097)
>> >> Can't find snapshot on backup storage, delete it in db
>> >>
>> >>
>> >>
>> >> -Jerry
>> >>
>> >>
>> >>
>> >> 
>> >> 发件人: Andrija Panic 
>> >> 发送时间: Saturday, September 7, 2019 1:07:19 AM
>> >> 收件人: users 
>> >> 抄送: d...@cloudstack.apache.org 
>> >> 主题: Re: 4.13 rbd snapshot delete failed
>> >>
>> >> storage.cleanup.delay
>> >> storage.cleanup.interval
>> >>
>> >> put both to 60 (seconds) and wait for up to 2min - should be deleted
>> just
>> >> fine...
>> >>
>> >> cheers
>> >>
>> >> On Fri, 6 Sep 2019 at 18:52, li jerry  wrote:
>> >>
>> >>> Hello All
>> >>>
>> >>> When I tested ACS 4.13 KVM + CEPH snapshot, I found that snapshots
>> could
>> >>> be created and rolled back (using API alone), but deletion could not
>> be
>> >>> completed.
>> >>>
>> >>>
>> >>>
>> >>> After executing the deletion API, the snapshot will disappear from the
>> >>> list Snapshots, but the snapshot on CEPH RBD will not be deleted (rbd
>> >> snap
>> >>> list rbd/ac510428-5d09-4e86-9d34-9dfab3715b7c)
>> >>>
>> >>>
>> >>>
>> >>> Is there any way we can completely delete the snapshot?
>> >>>
>> >>> -Jerry
>> >>>
>> >>>
>> >>
>> >> --
>> >>
>> >> Andrija Panić
>> >>
>> >
>>
>


Re: 4.13 rbd snapshot delete failed

2019-09-08 Thread Gabriel Beims Bräscher
age.Snapshot\":\"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f\"}","ctxAccountId":"2","uuid":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","cmdEventType":"SNAPSHOT.DELETE","_":"1567869534480"},
> >> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
> >> result: null, initMsid: 2200502468634, completeMsid: null, lastUpdated:
> >> null, lastPolled: null, created: null, removed: null}
> >>
> >> 2019-09-07 23:27:00,220 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> >> (API-Job-Executor-2:ctx-f0843047 job-1378) (logid:1cee5097) Executing
> >> AsyncJobVO {id:1378, userId: 2, accountId: 2, instanceType: Snapshot,
> >> instanceId: 13, cmd:
> >> org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd,
> cmdInfo:
> >>
> {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"1237","id":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","ctxDetails":"{\"interface
> >>
> com.cloud.storage.Snapshot\":\"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f\"}","ctxAccountId":"2","uuid":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","cmdEventType":"SNAPSHOT.DELETE","_":"1567869534480"},
> >> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
> >> result: null, initMsid: 2200502468634, completeMsid: null, lastUpdated:
> >> null, lastPolled: null, created: null, removed: null}
> >>
> >> 2019-09-07 23:27:00,221 DEBUG [c.c.a.ApiServlet]
> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) ===END===
> >> 192.168.254.3 -- GET
> >>
> command=deleteSnapshot=0b50eb7e-4f42-4de7-96c2-1fae137c8c9f=json&_=1567869534480
> >>
> >> 2019-09-07 23:27:00,305 DEBUG [c.c.a.m.ClusteredAgentAttache]
> >> (AgentManager-Handler-12:null) (logid:) Seq 1-8660140608456756853:
> Routing
> >> from 2199066247173
> >>
> >> 2019-09-07 23:27:00,305 DEBUG [o.a.c.s.s.XenserverSnapshotStrategy]
> >> (API-Job-Executor-2:ctx-f0843047 job-1378 ctx-f50e25a4) (logid:1cee5097)
> >> Can't find snapshot on backup storage, delete it in db
> >>
> >>
> >>
> >> -Jerry
> >>
> >>
> >>
> >> 
> >> 发件人: Andrija Panic 
> >> 发送时间: Saturday, September 7, 2019 1:07:19 AM
> >> 收件人: users 
> >> 抄送: d...@cloudstack.apache.org 
> >> 主题: Re: 4.13 rbd snapshot delete failed
> >>
> >> storage.cleanup.delay
> >> storage.cleanup.interval
> >>
> >> put both to 60 (seconds) and wait for up to 2min - should be deleted
> just
> >> fine...
> >>
> >> cheers
> >>
> >> On Fri, 6 Sep 2019 at 18:52, li jerry  wrote:
> >>
> >>> Hello All
> >>>
> >>> When I tested ACS 4.13 KVM + CEPH snapshot, I found that snapshots
> could
> >>> be created and rolled back (using API alone), but deletion could not be
> >>> completed.
> >>>
> >>>
> >>>
> >>> After executing the deletion API, the snapshot will disappear from the
> >>> list Snapshots, but the snapshot on CEPH RBD will not be deleted (rbd
> >> snap
> >>> list rbd/ac510428-5d09-4e86-9d34-9dfab3715b7c)
> >>>
> >>>
> >>>
> >>> Is there any way we can completely delete the snapshot?
> >>>
> >>> -Jerry
> >>>
> >>>
> >>
> >> --
> >>
> >> Andrija Panić
> >>
> >
>


Re: 4.13 rbd snapshot delete failed

2019-09-08 Thread Wido den Hollander



On 9/8/19 5:26 AM, Andrija Panic wrote:
> Maaany release ago, deleting Ceph volume snap, was also only deleting it in
> DB, so the RBD performance become terrible with many tens of (i. e. Hourly)
> snapshots. I'll try to verify this on 4.13 myself, but Wido and the guys
> will know better...

I pinged Gabriel and he's looking into it. He'll get back to it.

Wido

> 
> I
> 
> On Sat, Sep 7, 2019, 08:34 li jerry  wrote:
> 
>> I found it had nothing to do with  storage.cleanup.delay and
>> storage.cleanup.interval.
>>
>>
>>
>> The reason is that when DeleteSnapshot Cmd is executed, because the RBD
>> snapshot does not have Copy to secondary storage, it only changes the
>> database information, and does not enter the main storage to delete the
>> snapshot.
>>
>>
>>
>>
>>
>> Log===
>>
>>
>>
>> 2019-09-07 23:27:00,118 DEBUG [c.c.a.ApiServlet]
>> (qtp504527234-17:ctx-2e407b61) (logid:445cbea8) ===START===  192.168.254.3
>> -- GET
>> command=deleteSnapshot=0b50eb7e-4f42-4de7-96c2-1fae137c8c9f=json&_=1567869534480
>>
>> 2019-09-07 23:27:00,139 DEBUG [c.c.a.ApiServer]
>> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) CIDRs from
>> which account 'Acct[2f96c108-9408-11e9-a820-0200582b001a-admin]' is allowed
>> to perform API calls: 0.0.0.0/0,::/0
>>
>> 2019-09-07 23:27:00,204 DEBUG [c.c.a.ApiServer]
>> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) Retrieved
>> cmdEventType from job info: SNAPSHOT.DELETE
>>
>> 2019-09-07 23:27:00,217 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
>> (API-Job-Executor-2:ctx-f0843047 job-1378) (logid:c34a368a) Add job-1378
>> into job monitoring
>>
>> 2019-09-07 23:27:00,219 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) submit async
>> job-1378, details: AsyncJobVO {id:1378, userId: 2, accountId: 2,
>> instanceType: Snapshot, instanceId: 13, cmd:
>> org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd, cmdInfo:
>> {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"1237","id":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","ctxDetails":"{\"interface
>> com.cloud.storage.Snapshot\":\"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f\"}","ctxAccountId":"2","uuid":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","cmdEventType":"SNAPSHOT.DELETE","_":"1567869534480"},
>> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
>> result: null, initMsid: 2200502468634, completeMsid: null, lastUpdated:
>> null, lastPolled: null, created: null, removed: null}
>>
>> 2019-09-07 23:27:00,220 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> (API-Job-Executor-2:ctx-f0843047 job-1378) (logid:1cee5097) Executing
>> AsyncJobVO {id:1378, userId: 2, accountId: 2, instanceType: Snapshot,
>> instanceId: 13, cmd:
>> org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd, cmdInfo:
>> {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"1237","id":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","ctxDetails":"{\"interface
>> com.cloud.storage.Snapshot\":\"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f\"}","ctxAccountId":"2","uuid":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","cmdEventType":"SNAPSHOT.DELETE","_":"1567869534480"},
>> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
>> result: null, initMsid: 2200502468634, completeMsid: null, lastUpdated:
>> null, lastPolled: null, created: null, removed: null}
>>
>> 2019-09-07 23:27:00,221 DEBUG [c.c.a.ApiServlet]
>> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) ===END===
>> 192.168.254.3 -- GET
>> command=deleteSnapshot=0b50eb7e-4f42-4de7-96c2-1fae137c8c9f=json&_=1567869534480
>>
>> 2019-09-07 23:27:00,305 DEBUG [c.c.a.m.ClusteredAgentAttache]
>> (AgentManager-Handler-12:null) (logid:) Seq 1-8660140608456756853: Routing
>> from 2199066247173
>>
>> 2019-09-07 23:27:00,305 DEBUG [o.a.c.s.s.XenserverSnapshotStrategy]
>> (API-Job-Executor-2:ctx-f0843047 job-1378 ctx-f50e25a4) (logid:1cee5097)
>> Can't find snapshot on backup storage, delete it in db
>>
>>
>>
>> -Jerry
>>
>>
>>
>> 
>> 发件人: Andrija Panic 
>> 发送时间: Saturday, September 7, 2019 1:07:19 AM
>> 收件人: users 
>> 抄送: d...@cloudstack.apache.org 
>> 主题: Re: 4.13 rbd snapshot delete failed
>>
>> storage.cleanup.delay
>> storage.cleanup.interval
>>
>> put both to 60 (seconds) and wait for up to 2min - should be deleted just
>> fine...
>>
>> cheers
>>
>> On Fri, 6 Sep 2019 at 18:52, li jerry  wrote:
>>
>>> Hello All
>>>
>>> When I tested ACS 4.13 KVM + CEPH snapshot, I found that snapshots could
>>> be created and rolled back (using API alone), but deletion could not be
>>> completed.
>>>
>>>
>>>
>>> After executing the deletion API, the snapshot will disappear from the
>>> list Snapshots, but the snapshot on CEPH RBD will not be deleted (rbd
>> snap
>>> list rbd/ac510428-5d09-4e86-9d34-9dfab3715b7c)
>>>
>>>
>>>
>>> Is there any way we can completely delete the snapshot?
>>>
>>> -Jerry
>>>
>>>
>>
>> --
>>
>> Andrija Panić
>>
> 


Re: 4.13 rbd snapshot delete failed

2019-09-07 Thread Andrija Panic
Maaany release ago, deleting Ceph volume snap, was also only deleting it in
DB, so the RBD performance become terrible with many tens of (i. e. Hourly)
snapshots. I'll try to verify this on 4.13 myself, but Wido and the guys
will know better...

I

On Sat, Sep 7, 2019, 08:34 li jerry  wrote:

> I found it had nothing to do with  storage.cleanup.delay and
> storage.cleanup.interval.
>
>
>
> The reason is that when DeleteSnapshot Cmd is executed, because the RBD
> snapshot does not have Copy to secondary storage, it only changes the
> database information, and does not enter the main storage to delete the
> snapshot.
>
>
>
>
>
> Log===
>
>
>
> 2019-09-07 23:27:00,118 DEBUG [c.c.a.ApiServlet]
> (qtp504527234-17:ctx-2e407b61) (logid:445cbea8) ===START===  192.168.254.3
> -- GET
> command=deleteSnapshot=0b50eb7e-4f42-4de7-96c2-1fae137c8c9f=json&_=1567869534480
>
> 2019-09-07 23:27:00,139 DEBUG [c.c.a.ApiServer]
> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) CIDRs from
> which account 'Acct[2f96c108-9408-11e9-a820-0200582b001a-admin]' is allowed
> to perform API calls: 0.0.0.0/0,::/0
>
> 2019-09-07 23:27:00,204 DEBUG [c.c.a.ApiServer]
> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) Retrieved
> cmdEventType from job info: SNAPSHOT.DELETE
>
> 2019-09-07 23:27:00,217 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
> (API-Job-Executor-2:ctx-f0843047 job-1378) (logid:c34a368a) Add job-1378
> into job monitoring
>
> 2019-09-07 23:27:00,219 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) submit async
> job-1378, details: AsyncJobVO {id:1378, userId: 2, accountId: 2,
> instanceType: Snapshot, instanceId: 13, cmd:
> org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd, cmdInfo:
> {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"1237","id":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","ctxDetails":"{\"interface
> com.cloud.storage.Snapshot\":\"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f\"}","ctxAccountId":"2","uuid":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","cmdEventType":"SNAPSHOT.DELETE","_":"1567869534480"},
> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
> result: null, initMsid: 2200502468634, completeMsid: null, lastUpdated:
> null, lastPolled: null, created: null, removed: null}
>
> 2019-09-07 23:27:00,220 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> (API-Job-Executor-2:ctx-f0843047 job-1378) (logid:1cee5097) Executing
> AsyncJobVO {id:1378, userId: 2, accountId: 2, instanceType: Snapshot,
> instanceId: 13, cmd:
> org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd, cmdInfo:
> {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"1237","id":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","ctxDetails":"{\"interface
> com.cloud.storage.Snapshot\":\"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f\"}","ctxAccountId":"2","uuid":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","cmdEventType":"SNAPSHOT.DELETE","_":"1567869534480"},
> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
> result: null, initMsid: 2200502468634, completeMsid: null, lastUpdated:
> null, lastPolled: null, created: null, removed: null}
>
> 2019-09-07 23:27:00,221 DEBUG [c.c.a.ApiServlet]
> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) ===END===
> 192.168.254.3 -- GET
> command=deleteSnapshot=0b50eb7e-4f42-4de7-96c2-1fae137c8c9f=json&_=1567869534480
>
> 2019-09-07 23:27:00,305 DEBUG [c.c.a.m.ClusteredAgentAttache]
> (AgentManager-Handler-12:null) (logid:) Seq 1-8660140608456756853: Routing
> from 2199066247173
>
> 2019-09-07 23:27:00,305 DEBUG [o.a.c.s.s.XenserverSnapshotStrategy]
> (API-Job-Executor-2:ctx-f0843047 job-1378 ctx-f50e25a4) (logid:1cee5097)
> Can't find snapshot on backup storage, delete it in db
>
>
>
> -Jerry
>
>
>
> 
> 发件人: Andrija Panic 
> 发送时间: Saturday, September 7, 2019 1:07:19 AM
> 收件人: users 
> 抄送: d...@cloudstack.apache.org 
> 主题: Re: 4.13 rbd snapshot delete failed
>
> storage.cleanup.delay
> storage.cleanup.interval
>
> put both to 60 (seconds) and wait for up to 2min - should be deleted just
> fine...
>
> cheers
>
> On Fri, 6 Sep 2019 at 18:52, li jerry  wrote:
>
> > Hello All
> >
> > When I tested ACS 4.13 KVM + CEPH snapshot, I found that snapshots could
> > be created and rolled back (using API alone), but deletion could not be
> > completed.
> >
> >
> >
> > After executing the deletion API, the snapshot will disappear from the
> > list Snapshots, but the snapshot on CEPH RBD will not be deleted (rbd
> snap
> > list rbd/ac510428-5d09-4e86-9d34-9dfab3715b7c)
> >
> >
> >
> > Is there any way we can completely delete the snapshot?
> >
> > -Jerry
> >
> >
>
> --
>
> Andrija Panić
>


Re: 4.13 rbd snapshot delete failed

2019-09-07 Thread Wido den Hollander



On 9/6/19 11:34 PM, Andrija Panic wrote:
> One question though... for me (4.13, Nautilus 14.2, test env) - it fails to
> revert back to snapshot with below error
> 

Ok, that's weird.

Gabriel worked on this code recently, maybe he can take a look. I'll
ping him

Wido

> Which CEPH and QEMU/libvirt/os versions are you using?
> 
> 
> Error:
> 2019-09-06 21:27:16,094 ERROR
> [resource.wrapper.LibvirtRevertSnapshotCommandWrapper]
> (agentRequest-Handler-3:null) (logid:9593f65a) Failed to connect to revert
> snapshot due to RBD exception:
> com.ceph.rbd.RbdException: Failed to open image 2
> at com.ceph.rbd.Rbd.open(Rbd.java:243)
> at com.ceph.rbd.Rbd.open(Rbd.java:226)
> at
> com.cloud.hypervisor.kvm.resource.wrapper.LibvirtRevertSnapshotCommandWrapper.execute(LibvirtRevertSnapshotCommandWrapper.java:92)
> at
> com.cloud.hypervisor.kvm.resource.wrapper.LibvirtRevertSnapshotCommandWrapper.execute(LibvirtRevertSnapshotCommandWrapper.java:49)
> at
> com.cloud.hypervisor.kvm.resource.wrapper.LibvirtRequestWrapper.execute(LibvirtRequestWrapper.java:78)
> at
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1476)
> at com.cloud.agent.Agent.processRequest(Agent.java:640)
> at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:1053)
> at com.cloud.utils.nio.Task.call(Task.java:83)
> at com.cloud.utils.nio.Task.call(Task.java:29)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 
> On Fri, 6 Sep 2019 at 19:07, Andrija Panic  wrote:
> 
>> storage.cleanup.delay
>> storage.cleanup.interval
>>
>> put both to 60 (seconds) and wait for up to 2min - should be deleted just
>> fine...
>>
>> cheers
>>
>> On Fri, 6 Sep 2019 at 18:52, li jerry  wrote:
>>
>>> Hello All
>>>
>>> When I tested ACS 4.13 KVM + CEPH snapshot, I found that snapshots could
>>> be created and rolled back (using API alone), but deletion could not be
>>> completed.
>>>
>>>
>>>
>>> After executing the deletion API, the snapshot will disappear from the
>>> list Snapshots, but the snapshot on CEPH RBD will not be deleted (rbd snap
>>> list rbd/ac510428-5d09-4e86-9d34-9dfab3715b7c)
>>>
>>>
>>>
>>> Is there any way we can completely delete the snapshot?
>>>
>>> -Jerry
>>>
>>>
>>
>> --
>>
>> Andrija Panić
>>
> 
> 


Re: 4.13 rbd snapshot delete failed

2019-09-06 Thread Andrija Panic
One question though... for me (4.13, Nautilus 14.2, test env) - it fails to
revert back to snapshot with below error

Which CEPH and QEMU/libvirt/os versions are you using?


Error:
2019-09-06 21:27:16,094 ERROR
[resource.wrapper.LibvirtRevertSnapshotCommandWrapper]
(agentRequest-Handler-3:null) (logid:9593f65a) Failed to connect to revert
snapshot due to RBD exception:
com.ceph.rbd.RbdException: Failed to open image 2
at com.ceph.rbd.Rbd.open(Rbd.java:243)
at com.ceph.rbd.Rbd.open(Rbd.java:226)
at
com.cloud.hypervisor.kvm.resource.wrapper.LibvirtRevertSnapshotCommandWrapper.execute(LibvirtRevertSnapshotCommandWrapper.java:92)
at
com.cloud.hypervisor.kvm.resource.wrapper.LibvirtRevertSnapshotCommandWrapper.execute(LibvirtRevertSnapshotCommandWrapper.java:49)
at
com.cloud.hypervisor.kvm.resource.wrapper.LibvirtRequestWrapper.execute(LibvirtRequestWrapper.java:78)
at
com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1476)
at com.cloud.agent.Agent.processRequest(Agent.java:640)
at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:1053)
at com.cloud.utils.nio.Task.call(Task.java:83)
at com.cloud.utils.nio.Task.call(Task.java:29)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

On Fri, 6 Sep 2019 at 19:07, Andrija Panic  wrote:

> storage.cleanup.delay
> storage.cleanup.interval
>
> put both to 60 (seconds) and wait for up to 2min - should be deleted just
> fine...
>
> cheers
>
> On Fri, 6 Sep 2019 at 18:52, li jerry  wrote:
>
>> Hello All
>>
>> When I tested ACS 4.13 KVM + CEPH snapshot, I found that snapshots could
>> be created and rolled back (using API alone), but deletion could not be
>> completed.
>>
>>
>>
>> After executing the deletion API, the snapshot will disappear from the
>> list Snapshots, but the snapshot on CEPH RBD will not be deleted (rbd snap
>> list rbd/ac510428-5d09-4e86-9d34-9dfab3715b7c)
>>
>>
>>
>> Is there any way we can completely delete the snapshot?
>>
>> -Jerry
>>
>>
>
> --
>
> Andrija Panić
>


-- 

Andrija Panić


Re: 4.13 rbd snapshot delete failed

2019-09-06 Thread Andrija Panic
storage.cleanup.delay
storage.cleanup.interval

put both to 60 (seconds) and wait for up to 2min - should be deleted just
fine...

cheers

On Fri, 6 Sep 2019 at 18:52, li jerry  wrote:

> Hello All
>
> When I tested ACS 4.13 KVM + CEPH snapshot, I found that snapshots could
> be created and rolled back (using API alone), but deletion could not be
> completed.
>
>
>
> After executing the deletion API, the snapshot will disappear from the
> list Snapshots, but the snapshot on CEPH RBD will not be deleted (rbd snap
> list rbd/ac510428-5d09-4e86-9d34-9dfab3715b7c)
>
>
>
> Is there any way we can completely delete the snapshot?
>
> -Jerry
>
>

-- 

Andrija Panić


4.13 rbd snapshot delete failed

2019-09-06 Thread li jerry
Hello All

When I tested ACS 4.13 KVM + CEPH snapshot, I found that snapshots could be 
created and rolled back (using API alone), but deletion could not be completed.



After executing the deletion API, the snapshot will disappear from the list 
Snapshots, but the snapshot on CEPH RBD will not be deleted (rbd snap list 
rbd/ac510428-5d09-4e86-9d34-9dfab3715b7c)



Is there any way we can completely delete the snapshot?

-Jerry