[ovirt-devel] Re: [ovirt-users] Re: [=EXTERNAL=] Re: help using nvme/tcp storage with cinderlib and Managed Block Storage

2022-03-28 Thread Muli Ben-Yehuda
Hi Benny,

Any update on this one?
Also, is there a way I can test this with vdsm-client without resorting to
full ovirt? We have run into some issues with getting ovirt working with
the nightlies, but vdsm and vdsm-client appear to work fine with the
patches applied, or at least, they run.

Cheers,
Muli

On Thu, Mar 3, 2022 at 6:09 PM Benny Zlotnik  wrote:

> Hi,
>
> I posted draft PRs for engine[1] and vdsm[2], they are still raw and I
> only tested running starting VMs with ceph. If you can apply the
> changes for lightos (only vdsm should be needed) and try it out it
> would be great :)
> Also, if you have any suggestions/comments/etc feel free to comment on
> the PRs directly
>
> If you don't want to build ovirt-engine from source, CI generated RPMs
> should be available in[3] (the job is still running while I'm writing
> this email)
>
> [1] https://github.com/oVirt/ovirt-engine/pull/104
> [2] https://github.com/oVirt/vdsm/pull/89
> [3] https://github.com/oVirt/ovirt-engine/actions/runs/1929008680
>
>
> On Wed, Mar 2, 2022 at 4:55 PM Muli Ben-Yehuda 
> wrote:
> >
> > Thanks for the update, Benny. How can I help? For example, would logs
> from running the connector with the exact data it returns be useful?
> >
> > Cheers,
> > Muli
> >
> > On Tue, Mar 1, 2022 at 8:39 PM Benny Zlotnik 
> wrote:
> >>
> >> Hi,
> >>
> >> Just by browsing the code, I can think of one issue in[1], as a result
> >> of[2] where we only considered iscsi and rbd drivers, I suspect your
> >> driver will go into this branch, based on the issue in the 4.3 logs I
> >> went over:
> >>
> backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/builder/vminfo/LibvirtVmXmlBuilder.java
> >>
> >> } else if (managedBlockStorageDisk.getCinderVolumeDriver()
> >> == CinderVolumeDriver.BLOCK) {
> >> Map attachment =
> >> (Map)
> >> managedBlockStorageDisk.getDevice().get(DeviceInfoReturn.ATTACHMENT);
> >> metadata = Map.of(
> >> "GUID",
> >> (String)attachment.get(DeviceInfoReturn.SCSI_WWN),
> >> "managed", "true"
> >>
> >> Which will make it go into the wrong branch in clientIF.py, appending
> >> the empty GUID to /dev/mapper. Perhaps it is possible workaround it in
> >> clientIF if you just want to try and get the VM started for now, by
> >> checking if GUID is empty and deferring to:
> >>volPath = drive['path']
> >>
> >> But as discussed in this thread, our attempt at constructing the
> >> stable paths ourselves doesn't really scale. After further discussion
> >> with Nir I started working on creating a link in vdsm in
> >> managevolume.py#attach_volume to the path returned by the driver, and
> >> engine will use our link to run the VMs.
> >> This should simplify the code and resolve the live VM migration issue.
> >> I had some preliminary success with this so I'll try to post the
> >> patches soon
> >>
> >>
> >> [1]
> https://github.com/oVirt/vdsm/blob/d957a06a4d988489c83da171fcd9cfd254b12ca4/lib/vdsm/clientIF.py#L462
> >> [2]
> https://github.com/oVirt/ovirt-engine/blob/24530d17874e20581deee4b0e319146cdcacb8f5/backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/builder/vminfo/LibvirtVmXmlBuilder.java#L2424
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Mar 1, 2022 at 6:12 PM Muli Ben-Yehuda 
> wrote:
> >> >
> >> > Will this support require changes in ovirt-engine or just in vdsm? I
> have started to look into vdsm's managedvolume.py and its tests and it
> seems like adding support for LightOS there should be pretty simple (famous
> last words...). Should this be enough or do you think it will require
> changes in other parts of ovirt as well?
> >> >
> >> > Cheers,
> >> > Muli
> >> >
> >> > On Mon, Feb 28, 2022 at 9:09 AM Nir Soffer 
> wrote:
> >> >>
> >> >> On Fri, Feb 25, 2022 at 12:04 PM Gorka Eguileor 
> wrote:
> >> >> >
> >> >> > On 24/02, Nir Soffer wrote:
> >> >> > > On Thu, Feb 24, 2022 at 8:46 PM Gorka Eguileor <
> gegui...@redhat.com> wrote:
> >> >> > > >
> >> >> > > > On 24/02, Nir Soffer wrote:
> >> >> > > > > On Thu, Feb 24, 2022 at 6:35 PM Muli Ben-Yehuda <
> m...@lightbitslabs.com> wrote:
> >> >> > > > > >
> >> >> > > > > > On Thu, Feb 24, 2022 at 6:28 PM Nir Soffer <
> nsof...@redhat.com> wrote:
> >> >> > > > > >>
> >> >> > > > > >> On Thu, Feb 24, 2022 at 6:10 PM Muli Ben-Yehuda <
> m...@lightbitslabs.com> wrote:
> >> >> > > > > >> >
> >> >> > > > > >> > On Thu, Feb 24, 2022 at 3:58 PM Nir Soffer <
> nsof...@redhat.com> wrote:
> >> >> > > > > >> >>
> >> >> > > > > >> >> On Wed, Feb 23, 2022 at 6:24 PM Muli Ben-Yehuda <
> m...@lightbitslabs.com> wrote:
> >> >> > > > > >> >> >
> >> >> > > > > >> >> > Thanks for the detailed instructions, Nir. I'm going
> to scrounge up some hardware.
> >> >> > > > > >> >> > By the way, if anyone else would like to work on
> NVMe/TCP support, for NVMe/TCP target you can either use Lightbits (talk to
> me 

[ovirt-devel] Re: [ovirt-users] Re: [=EXTERNAL=] Re: help using nvme/tcp storage with cinderlib and Managed Block Storage

2022-03-27 Thread Benny Zlotnik
Hi, I replied in the PR

Regarding testing this with vdsm-client, in theory it's possible, but
would be quite difficult as you'd have to prepare the datacenter and
add traditional storage (vdsm managed iscsi/nfs),
with:
  $ vdsm-client StoragePool create/connect
then with StorageDomain namespaces, and probably a bunch of other
stuff ovirt-engine does automatically, until you can get to the
  $ vdsm-client ManagedVolume attach_volume
operations.
but I am not sure how practical it is to do this, I am pretty sure it
would be much easier to do this with ovirt-engine...

Can you share what issues you ran into with ovirt-engine?
I rebased my engine PR[1] that's required to test this, new RPMs
should be available soon

[1] https://github.com/oVirt/ovirt-engine/pull/104


On Sun, Mar 27, 2022 at 1:47 PM Muli Ben-Yehuda  wrote:
>
> Hi Benny,
>
> Any update on this one?
> Also, is there a way I can test this with vdsm-client without resorting to 
> full ovirt? We have run into some issues with getting ovirt working with the 
> nightlies, but vdsm and vdsm-client appear to work fine with the patches 
> applied, or at least, they run.
>
> Cheers,
> Muli
>
> On Thu, Mar 3, 2022 at 6:09 PM Benny Zlotnik  wrote:
>>
>> Hi,
>>
>> I posted draft PRs for engine[1] and vdsm[2], they are still raw and I
>> only tested running starting VMs with ceph. If you can apply the
>> changes for lightos (only vdsm should be needed) and try it out it
>> would be great :)
>> Also, if you have any suggestions/comments/etc feel free to comment on
>> the PRs directly
>>
>> If you don't want to build ovirt-engine from source, CI generated RPMs
>> should be available in[3] (the job is still running while I'm writing
>> this email)
>>
>> [1] https://github.com/oVirt/ovirt-engine/pull/104
>> [2] https://github.com/oVirt/vdsm/pull/89
>> [3] https://github.com/oVirt/ovirt-engine/actions/runs/1929008680
>>
>>
>> On Wed, Mar 2, 2022 at 4:55 PM Muli Ben-Yehuda  
>> wrote:
>> >
>> > Thanks for the update, Benny. How can I help? For example, would logs from 
>> > running the connector with the exact data it returns be useful?
>> >
>> > Cheers,
>> > Muli
>> >
>> > On Tue, Mar 1, 2022 at 8:39 PM Benny Zlotnik  wrote:
>> >>
>> >> Hi,
>> >>
>> >> Just by browsing the code, I can think of one issue in[1], as a result
>> >> of[2] where we only considered iscsi and rbd drivers, I suspect your
>> >> driver will go into this branch, based on the issue in the 4.3 logs I
>> >> went over:
>> >> backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/builder/vminfo/LibvirtVmXmlBuilder.java
>> >>
>> >> } else if (managedBlockStorageDisk.getCinderVolumeDriver()
>> >> == CinderVolumeDriver.BLOCK) {
>> >> Map attachment =
>> >> (Map)
>> >> managedBlockStorageDisk.getDevice().get(DeviceInfoReturn.ATTACHMENT);
>> >> metadata = Map.of(
>> >> "GUID",
>> >> (String)attachment.get(DeviceInfoReturn.SCSI_WWN),
>> >> "managed", "true"
>> >>
>> >> Which will make it go into the wrong branch in clientIF.py, appending
>> >> the empty GUID to /dev/mapper. Perhaps it is possible workaround it in
>> >> clientIF if you just want to try and get the VM started for now, by
>> >> checking if GUID is empty and deferring to:
>> >>volPath = drive['path']
>> >>
>> >> But as discussed in this thread, our attempt at constructing the
>> >> stable paths ourselves doesn't really scale. After further discussion
>> >> with Nir I started working on creating a link in vdsm in
>> >> managevolume.py#attach_volume to the path returned by the driver, and
>> >> engine will use our link to run the VMs.
>> >> This should simplify the code and resolve the live VM migration issue.
>> >> I had some preliminary success with this so I'll try to post the
>> >> patches soon
>> >>
>> >>
>> >> [1] 
>> >> https://github.com/oVirt/vdsm/blob/d957a06a4d988489c83da171fcd9cfd254b12ca4/lib/vdsm/clientIF.py#L462
>> >> [2] 
>> >> https://github.com/oVirt/ovirt-engine/blob/24530d17874e20581deee4b0e319146cdcacb8f5/backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/builder/vminfo/LibvirtVmXmlBuilder.java#L2424
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Tue, Mar 1, 2022 at 6:12 PM Muli Ben-Yehuda  
>> >> wrote:
>> >> >
>> >> > Will this support require changes in ovirt-engine or just in vdsm? I 
>> >> > have started to look into vdsm's managedvolume.py and its tests and it 
>> >> > seems like adding support for LightOS there should be pretty simple 
>> >> > (famous last words...). Should this be enough or do you think it will 
>> >> > require changes in other parts of ovirt as well?
>> >> >
>> >> > Cheers,
>> >> > Muli
>> >> >
>> >> > On Mon, Feb 28, 2022 at 9:09 AM Nir Soffer  wrote:
>> >> >>
>> >> >> On Fri, Feb 25, 2022 at 12:04 PM Gorka Eguileor  
>> >> >> wrote:
>> >> >> >
>> >> >> > On 24/02, Nir Soffer wrote:
>> >> >> > > On 

[ovirt-devel] Re: [ovirt-users] Re: [=EXTERNAL=] Re: help using nvme/tcp storage with cinderlib and Managed Block Storage

2022-03-03 Thread Benny Zlotnik
Hi,

I posted draft PRs for engine[1] and vdsm[2], they are still raw and I
only tested running starting VMs with ceph. If you can apply the
changes for lightos (only vdsm should be needed) and try it out it
would be great :)
Also, if you have any suggestions/comments/etc feel free to comment on
the PRs directly

If you don't want to build ovirt-engine from source, CI generated RPMs
should be available in[3] (the job is still running while I'm writing
this email)

[1] https://github.com/oVirt/ovirt-engine/pull/104
[2] https://github.com/oVirt/vdsm/pull/89
[3] https://github.com/oVirt/ovirt-engine/actions/runs/1929008680


On Wed, Mar 2, 2022 at 4:55 PM Muli Ben-Yehuda  wrote:
>
> Thanks for the update, Benny. How can I help? For example, would logs from 
> running the connector with the exact data it returns be useful?
>
> Cheers,
> Muli
>
> On Tue, Mar 1, 2022 at 8:39 PM Benny Zlotnik  wrote:
>>
>> Hi,
>>
>> Just by browsing the code, I can think of one issue in[1], as a result
>> of[2] where we only considered iscsi and rbd drivers, I suspect your
>> driver will go into this branch, based on the issue in the 4.3 logs I
>> went over:
>> backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/builder/vminfo/LibvirtVmXmlBuilder.java
>>
>> } else if (managedBlockStorageDisk.getCinderVolumeDriver()
>> == CinderVolumeDriver.BLOCK) {
>> Map attachment =
>> (Map)
>> managedBlockStorageDisk.getDevice().get(DeviceInfoReturn.ATTACHMENT);
>> metadata = Map.of(
>> "GUID",
>> (String)attachment.get(DeviceInfoReturn.SCSI_WWN),
>> "managed", "true"
>>
>> Which will make it go into the wrong branch in clientIF.py, appending
>> the empty GUID to /dev/mapper. Perhaps it is possible workaround it in
>> clientIF if you just want to try and get the VM started for now, by
>> checking if GUID is empty and deferring to:
>>volPath = drive['path']
>>
>> But as discussed in this thread, our attempt at constructing the
>> stable paths ourselves doesn't really scale. After further discussion
>> with Nir I started working on creating a link in vdsm in
>> managevolume.py#attach_volume to the path returned by the driver, and
>> engine will use our link to run the VMs.
>> This should simplify the code and resolve the live VM migration issue.
>> I had some preliminary success with this so I'll try to post the
>> patches soon
>>
>>
>> [1] 
>> https://github.com/oVirt/vdsm/blob/d957a06a4d988489c83da171fcd9cfd254b12ca4/lib/vdsm/clientIF.py#L462
>> [2] 
>> https://github.com/oVirt/ovirt-engine/blob/24530d17874e20581deee4b0e319146cdcacb8f5/backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/builder/vminfo/LibvirtVmXmlBuilder.java#L2424
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Mar 1, 2022 at 6:12 PM Muli Ben-Yehuda  
>> wrote:
>> >
>> > Will this support require changes in ovirt-engine or just in vdsm? I have 
>> > started to look into vdsm's managedvolume.py and its tests and it seems 
>> > like adding support for LightOS there should be pretty simple (famous last 
>> > words...). Should this be enough or do you think it will require changes 
>> > in other parts of ovirt as well?
>> >
>> > Cheers,
>> > Muli
>> >
>> > On Mon, Feb 28, 2022 at 9:09 AM Nir Soffer  wrote:
>> >>
>> >> On Fri, Feb 25, 2022 at 12:04 PM Gorka Eguileor  
>> >> wrote:
>> >> >
>> >> > On 24/02, Nir Soffer wrote:
>> >> > > On Thu, Feb 24, 2022 at 8:46 PM Gorka Eguileor  
>> >> > > wrote:
>> >> > > >
>> >> > > > On 24/02, Nir Soffer wrote:
>> >> > > > > On Thu, Feb 24, 2022 at 6:35 PM Muli Ben-Yehuda 
>> >> > > > >  wrote:
>> >> > > > > >
>> >> > > > > > On Thu, Feb 24, 2022 at 6:28 PM Nir Soffer  
>> >> > > > > > wrote:
>> >> > > > > >>
>> >> > > > > >> On Thu, Feb 24, 2022 at 6:10 PM Muli Ben-Yehuda 
>> >> > > > > >>  wrote:
>> >> > > > > >> >
>> >> > > > > >> > On Thu, Feb 24, 2022 at 3:58 PM Nir Soffer 
>> >> > > > > >> >  wrote:
>> >> > > > > >> >>
>> >> > > > > >> >> On Wed, Feb 23, 2022 at 6:24 PM Muli Ben-Yehuda 
>> >> > > > > >> >>  wrote:
>> >> > > > > >> >> >
>> >> > > > > >> >> > Thanks for the detailed instructions, Nir. I'm going to 
>> >> > > > > >> >> > scrounge up some hardware.
>> >> > > > > >> >> > By the way, if anyone else would like to work on NVMe/TCP 
>> >> > > > > >> >> > support, for NVMe/TCP target you can either use Lightbits 
>> >> > > > > >> >> > (talk to me offline for details) or use the upstream 
>> >> > > > > >> >> > Linux NVMe/TCP target. Lightbits is a clustered storage 
>> >> > > > > >> >> > system while upstream is a single target, but the client 
>> >> > > > > >> >> > side should be close enough for vdsm/ovirt purposes.
>> >> > > > > >> >>
>> >> > > > > >> >> I played with NVMe/TCP a little bit, using qemu to create a 
>> >> > > > > >> >> virtual
>> >> > > > > >> >> NVMe disk, and export
>> >> > > > > >> >> it using the kernel on one VM, and consume it on another 

[ovirt-devel] Re: [ovirt-users] Re: [=EXTERNAL=] Re: help using nvme/tcp storage with cinderlib and Managed Block Storage

2022-03-02 Thread Muli Ben-Yehuda
Thanks for the update, Benny. How can I help? For example, would logs from
running the connector with the exact data it returns be useful?

Cheers,
Muli

On Tue, Mar 1, 2022 at 8:39 PM Benny Zlotnik  wrote:

> Hi,
>
> Just by browsing the code, I can think of one issue in[1], as a result
> of[2] where we only considered iscsi and rbd drivers, I suspect your
> driver will go into this branch, based on the issue in the 4.3 logs I
> went over:
>
> backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/builder/vminfo/LibvirtVmXmlBuilder.java
>
> } else if (managedBlockStorageDisk.getCinderVolumeDriver()
> == CinderVolumeDriver.BLOCK) {
> Map attachment =
> (Map)
> managedBlockStorageDisk.getDevice().get(DeviceInfoReturn.ATTACHMENT);
> metadata = Map.of(
> "GUID",
> (String)attachment.get(DeviceInfoReturn.SCSI_WWN),
> "managed", "true"
>
> Which will make it go into the wrong branch in clientIF.py, appending
> the empty GUID to /dev/mapper. Perhaps it is possible workaround it in
> clientIF if you just want to try and get the VM started for now, by
> checking if GUID is empty and deferring to:
>volPath = drive['path']
>
> But as discussed in this thread, our attempt at constructing the
> stable paths ourselves doesn't really scale. After further discussion
> with Nir I started working on creating a link in vdsm in
> managevolume.py#attach_volume to the path returned by the driver, and
> engine will use our link to run the VMs.
> This should simplify the code and resolve the live VM migration issue.
> I had some preliminary success with this so I'll try to post the
> patches soon
>
>
> [1]
> https://github.com/oVirt/vdsm/blob/d957a06a4d988489c83da171fcd9cfd254b12ca4/lib/vdsm/clientIF.py#L462
> [2]
> https://github.com/oVirt/ovirt-engine/blob/24530d17874e20581deee4b0e319146cdcacb8f5/backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/builder/vminfo/LibvirtVmXmlBuilder.java#L2424
>
>
>
>
>
>
>
>
>
>
> On Tue, Mar 1, 2022 at 6:12 PM Muli Ben-Yehuda 
> wrote:
> >
> > Will this support require changes in ovirt-engine or just in vdsm? I
> have started to look into vdsm's managedvolume.py and its tests and it
> seems like adding support for LightOS there should be pretty simple (famous
> last words...). Should this be enough or do you think it will require
> changes in other parts of ovirt as well?
> >
> > Cheers,
> > Muli
> >
> > On Mon, Feb 28, 2022 at 9:09 AM Nir Soffer  wrote:
> >>
> >> On Fri, Feb 25, 2022 at 12:04 PM Gorka Eguileor 
> wrote:
> >> >
> >> > On 24/02, Nir Soffer wrote:
> >> > > On Thu, Feb 24, 2022 at 8:46 PM Gorka Eguileor 
> wrote:
> >> > > >
> >> > > > On 24/02, Nir Soffer wrote:
> >> > > > > On Thu, Feb 24, 2022 at 6:35 PM Muli Ben-Yehuda <
> m...@lightbitslabs.com> wrote:
> >> > > > > >
> >> > > > > > On Thu, Feb 24, 2022 at 6:28 PM Nir Soffer <
> nsof...@redhat.com> wrote:
> >> > > > > >>
> >> > > > > >> On Thu, Feb 24, 2022 at 6:10 PM Muli Ben-Yehuda <
> m...@lightbitslabs.com> wrote:
> >> > > > > >> >
> >> > > > > >> > On Thu, Feb 24, 2022 at 3:58 PM Nir Soffer <
> nsof...@redhat.com> wrote:
> >> > > > > >> >>
> >> > > > > >> >> On Wed, Feb 23, 2022 at 6:24 PM Muli Ben-Yehuda <
> m...@lightbitslabs.com> wrote:
> >> > > > > >> >> >
> >> > > > > >> >> > Thanks for the detailed instructions, Nir. I'm going to
> scrounge up some hardware.
> >> > > > > >> >> > By the way, if anyone else would like to work on
> NVMe/TCP support, for NVMe/TCP target you can either use Lightbits (talk to
> me offline for details) or use the upstream Linux NVMe/TCP target.
> Lightbits is a clustered storage system while upstream is a single target,
> but the client side should be close enough for vdsm/ovirt purposes.
> >> > > > > >> >>
> >> > > > > >> >> I played with NVMe/TCP a little bit, using qemu to create
> a virtual
> >> > > > > >> >> NVMe disk, and export
> >> > > > > >> >> it using the kernel on one VM, and consume it on another
> VM.
> >> > > > > >> >>
> https://futurewei-cloud.github.io/ARM-Datacenter/qemu/nvme-of-tcp-vms/
> >> > > > > >> >>
> >> > > > > >> >> One question about device naming - do we always get the
> same name of the
> >> > > > > >> >> device in all hosts?
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> > No, we do not, see below how we handle migration in
> os_brick.
> >> > > > > >> >
> >> > > > > >> >> To support VM migration, every device must have unique
> name in the cluster.
> >> > > > > >> >> With multipath we always have unique name, since we
> disable "friendly names",
> >> > > > > >> >> so we always have:
> >> > > > > >> >>
> >> > > > > >> >> /dev/mapper/{wwid}
> >> > > > > >> >>
> >> > > > > >> >> With rbd we also do not use /dev/rbdN but a unique path:
> >> > > > > >> >>
> >> > > > > >> >> /dev/rbd/poolname/volume-vol-id
> >> > > > > >> >>
> >> > > > > >> >> How do we ensure cluster-unique 

[ovirt-devel] Re: [ovirt-users] Re: [=EXTERNAL=] Re: help using nvme/tcp storage with cinderlib and Managed Block Storage

2022-03-02 Thread Muli Ben-Yehuda
Will this support require changes in ovirt-engine or just in vdsm? I have
started to look into vdsm's managedvolume.py and its tests and it seems
like adding support for LightOS there should be pretty simple (famous last
words...). Should this be enough or do you think it will require changes in
other parts of ovirt as well?

Cheers,
Muli

On Mon, Feb 28, 2022 at 9:09 AM Nir Soffer  wrote:

> On Fri, Feb 25, 2022 at 12:04 PM Gorka Eguileor 
> wrote:
> >
> > On 24/02, Nir Soffer wrote:
> > > On Thu, Feb 24, 2022 at 8:46 PM Gorka Eguileor 
> wrote:
> > > >
> > > > On 24/02, Nir Soffer wrote:
> > > > > On Thu, Feb 24, 2022 at 6:35 PM Muli Ben-Yehuda <
> m...@lightbitslabs.com> wrote:
> > > > > >
> > > > > > On Thu, Feb 24, 2022 at 6:28 PM Nir Soffer 
> wrote:
> > > > > >>
> > > > > >> On Thu, Feb 24, 2022 at 6:10 PM Muli Ben-Yehuda <
> m...@lightbitslabs.com> wrote:
> > > > > >> >
> > > > > >> > On Thu, Feb 24, 2022 at 3:58 PM Nir Soffer <
> nsof...@redhat.com> wrote:
> > > > > >> >>
> > > > > >> >> On Wed, Feb 23, 2022 at 6:24 PM Muli Ben-Yehuda <
> m...@lightbitslabs.com> wrote:
> > > > > >> >> >
> > > > > >> >> > Thanks for the detailed instructions, Nir. I'm going to
> scrounge up some hardware.
> > > > > >> >> > By the way, if anyone else would like to work on NVMe/TCP
> support, for NVMe/TCP target you can either use Lightbits (talk to me
> offline for details) or use the upstream Linux NVMe/TCP target. Lightbits
> is a clustered storage system while upstream is a single target, but the
> client side should be close enough for vdsm/ovirt purposes.
> > > > > >> >>
> > > > > >> >> I played with NVMe/TCP a little bit, using qemu to create a
> virtual
> > > > > >> >> NVMe disk, and export
> > > > > >> >> it using the kernel on one VM, and consume it on another VM.
> > > > > >> >>
> https://futurewei-cloud.github.io/ARM-Datacenter/qemu/nvme-of-tcp-vms/
> > > > > >> >>
> > > > > >> >> One question about device naming - do we always get the same
> name of the
> > > > > >> >> device in all hosts?
> > > > > >> >
> > > > > >> >
> > > > > >> > No, we do not, see below how we handle migration in os_brick.
> > > > > >> >
> > > > > >> >> To support VM migration, every device must have unique name
> in the cluster.
> > > > > >> >> With multipath we always have unique name, since we disable
> "friendly names",
> > > > > >> >> so we always have:
> > > > > >> >>
> > > > > >> >> /dev/mapper/{wwid}
> > > > > >> >>
> > > > > >> >> With rbd we also do not use /dev/rbdN but a unique path:
> > > > > >> >>
> > > > > >> >> /dev/rbd/poolname/volume-vol-id
> > > > > >> >>
> > > > > >> >> How do we ensure cluster-unique device path? If os_brick
> does not handle it, we
> > > > > >> >> can to do in ovirt, for example:
> > > > > >> >>
> > > > > >> >> /run/vdsm/mangedvolumes/{uuid} -> /dev/nvme7n42
> > > > > >> >>
> > > > > >> >> but I think this should be handled in cinderlib, since
> openstack have
> > > > > >> >> the same problem with migration.
> > > > > >> >
> > > > > >> >
> > > > > >> > Indeed. Both the Lightbits LightOS connector and the nvmeof
> connector do this through the target provided namespace (LUN) UUID. After
> connecting to the target, the connectors wait for the local friendly-named
> device file that has the right UUID to show up, and then return the
> friendly name. So different hosts will have different friendly names, but
> the VMs will be attached to the right namespace since we return the
> friendly name on the current host that has the right UUID. Does this also
> work for you?
> > > > > >>
> > > > > >> It will not work for oVirt.
> > > > > >>
> > > > > >> Migration in oVirt works like this:
> > > > > >>
> > > > > >> 1. Attach disks to destination host
> > > > > >> 2. Send VM XML from source host to destination host, and start
> the
> > > > > >>VM is paused mode
> > > > > >> 3. Start the migration on the source host
> > > > > >> 4. When migration is done, start the CPU on the destination host
> > > > > >> 5. Detach the disks from the source
> > > > > >>
> > > > > >> This will break in step 2, since the source xml refer to nvme
> device
> > > > > >> that does not exist or already used by another VM.
> > > > > >
> > > > > >
> > > > > > Indeed.
> > > > > >
> > > > > >> To make this work, the VM XML must use the same path, existing
> on
> > > > > >> both hosts.
> > > > > >>
> > > > > >> The issue can be solved by libvirt hook updating the paths
> before qemu
> > > > > >> is started on the destination, but I think the right way to
> handle this is to
> > > > > >> have the same path.
> > > > > >
> > > > > >
> > > > > >  You mentioned above that it can be handled in ovirt (c.f.,
> /run/vdsm/mangedvolumes/{uuid} -> /dev/nvme7n42), which seems like a
> reasonable approach given the constraint imposed by the oVirt migration
> flow you outlined above. What information does vdsm need to create and use
> the /var/run/vdsm/managedvolumes/{uuid} link? Today the connector does
> (trimmed for brevity):
> > > 

[ovirt-devel] Re: [ovirt-users] Re: [=EXTERNAL=] Re: help using nvme/tcp storage with cinderlib and Managed Block Storage

2022-03-02 Thread Gorka Eguileor
On 24/02, Nir Soffer wrote:
> On Thu, Feb 24, 2022 at 6:35 PM Muli Ben-Yehuda  
> wrote:
> >
> > On Thu, Feb 24, 2022 at 6:28 PM Nir Soffer  wrote:
> >>
> >> On Thu, Feb 24, 2022 at 6:10 PM Muli Ben-Yehuda  
> >> wrote:
> >> >
> >> > On Thu, Feb 24, 2022 at 3:58 PM Nir Soffer  wrote:
> >> >>
> >> >> On Wed, Feb 23, 2022 at 6:24 PM Muli Ben-Yehuda 
> >> >>  wrote:
> >> >> >
> >> >> > Thanks for the detailed instructions, Nir. I'm going to scrounge up 
> >> >> > some hardware.
> >> >> > By the way, if anyone else would like to work on NVMe/TCP support, 
> >> >> > for NVMe/TCP target you can either use Lightbits (talk to me offline 
> >> >> > for details) or use the upstream Linux NVMe/TCP target. Lightbits is 
> >> >> > a clustered storage system while upstream is a single target, but the 
> >> >> > client side should be close enough for vdsm/ovirt purposes.
> >> >>
> >> >> I played with NVMe/TCP a little bit, using qemu to create a virtual
> >> >> NVMe disk, and export
> >> >> it using the kernel on one VM, and consume it on another VM.
> >> >> https://futurewei-cloud.github.io/ARM-Datacenter/qemu/nvme-of-tcp-vms/
> >> >>
> >> >> One question about device naming - do we always get the same name of the
> >> >> device in all hosts?
> >> >
> >> >
> >> > No, we do not, see below how we handle migration in os_brick.
> >> >
> >> >> To support VM migration, every device must have unique name in the 
> >> >> cluster.
> >> >> With multipath we always have unique name, since we disable "friendly 
> >> >> names",
> >> >> so we always have:
> >> >>
> >> >> /dev/mapper/{wwid}
> >> >>
> >> >> With rbd we also do not use /dev/rbdN but a unique path:
> >> >>
> >> >> /dev/rbd/poolname/volume-vol-id
> >> >>
> >> >> How do we ensure cluster-unique device path? If os_brick does not 
> >> >> handle it, we
> >> >> can to do in ovirt, for example:
> >> >>
> >> >> /run/vdsm/mangedvolumes/{uuid} -> /dev/nvme7n42
> >> >>
> >> >> but I think this should be handled in cinderlib, since openstack have
> >> >> the same problem with migration.
> >> >
> >> >
> >> > Indeed. Both the Lightbits LightOS connector and the nvmeof connector do 
> >> > this through the target provided namespace (LUN) UUID. After connecting 
> >> > to the target, the connectors wait for the local friendly-named device 
> >> > file that has the right UUID to show up, and then return the friendly 
> >> > name. So different hosts will have different friendly names, but the VMs 
> >> > will be attached to the right namespace since we return the friendly 
> >> > name on the current host that has the right UUID. Does this also work 
> >> > for you?
> >>
> >> It will not work for oVirt.
> >>
> >> Migration in oVirt works like this:
> >>
> >> 1. Attach disks to destination host
> >> 2. Send VM XML from source host to destination host, and start the
> >>VM is paused mode
> >> 3. Start the migration on the source host
> >> 4. When migration is done, start the CPU on the destination host
> >> 5. Detach the disks from the source
> >>
> >> This will break in step 2, since the source xml refer to nvme device
> >> that does not exist or already used by another VM.
> >
> >
> > Indeed.
> >
> >> To make this work, the VM XML must use the same path, existing on
> >> both hosts.
> >>
> >> The issue can be solved by libvirt hook updating the paths before qemu
> >> is started on the destination, but I think the right way to handle this is 
> >> to
> >> have the same path.
> >
> >
> >  You mentioned above that it can be handled in ovirt (c.f., 
> > /run/vdsm/mangedvolumes/{uuid} -> /dev/nvme7n42), which seems like a 
> > reasonable approach given the constraint imposed by the oVirt migration 
> > flow you outlined above. What information does vdsm need to create and use 
> > the /var/run/vdsm/managedvolumes/{uuid} link? Today the connector does 
> > (trimmed for brevity):
> >
> > def connect_volume(self, connection_properties):
> > device_info = {'type': 'block'}
> > uuid = connection_properties['uuid']
> > device_path = self._get_device_by_uuid(uuid)
> > device_info['path'] = device_path
> > return device_info
>
> I think we have 2 options:
>
> 1. unique path created by os_brick using the underlying uuid
>
> In this case the connector will return the uuid, and ovirt will use
> it to resolve the unique path that will be stored and used on engine
> side to create the vm xml.
>
> I'm not sure how the connector should return this uuid. Looking in current
> vdsm code:
>
> if vol_type in ("iscsi", "fibre_channel"):
> if "multipath_id" not in attachment:
> raise se.ManagedVolumeUnsupportedDevice(vol_id, attachment)
> # /dev/mapper/xxxyyy
> return os.path.join(DEV_MAPPER, attachment["multipath_id"])
> elif vol_type == "rbd":
> # /dev/rbd/poolname/volume-vol-id
> return os.path.join(DEV_RBD, connection_info['data']['name'])
>
> os_brick does not have a uniform way to address 

[ovirt-devel] Re: [ovirt-users] Re: [=EXTERNAL=] Re: help using nvme/tcp storage with cinderlib and Managed Block Storage

2022-03-02 Thread Gorka Eguileor
On 24/02, Nir Soffer wrote:
> On Thu, Feb 24, 2022 at 8:46 PM Gorka Eguileor  wrote:
> >
> > On 24/02, Nir Soffer wrote:
> > > On Thu, Feb 24, 2022 at 6:35 PM Muli Ben-Yehuda  
> > > wrote:
> > > >
> > > > On Thu, Feb 24, 2022 at 6:28 PM Nir Soffer  wrote:
> > > >>
> > > >> On Thu, Feb 24, 2022 at 6:10 PM Muli Ben-Yehuda 
> > > >>  wrote:
> > > >> >
> > > >> > On Thu, Feb 24, 2022 at 3:58 PM Nir Soffer  
> > > >> > wrote:
> > > >> >>
> > > >> >> On Wed, Feb 23, 2022 at 6:24 PM Muli Ben-Yehuda 
> > > >> >>  wrote:
> > > >> >> >
> > > >> >> > Thanks for the detailed instructions, Nir. I'm going to scrounge 
> > > >> >> > up some hardware.
> > > >> >> > By the way, if anyone else would like to work on NVMe/TCP 
> > > >> >> > support, for NVMe/TCP target you can either use Lightbits (talk 
> > > >> >> > to me offline for details) or use the upstream Linux NVMe/TCP 
> > > >> >> > target. Lightbits is a clustered storage system while upstream is 
> > > >> >> > a single target, but the client side should be close enough for 
> > > >> >> > vdsm/ovirt purposes.
> > > >> >>
> > > >> >> I played with NVMe/TCP a little bit, using qemu to create a virtual
> > > >> >> NVMe disk, and export
> > > >> >> it using the kernel on one VM, and consume it on another VM.
> > > >> >> https://futurewei-cloud.github.io/ARM-Datacenter/qemu/nvme-of-tcp-vms/
> > > >> >>
> > > >> >> One question about device naming - do we always get the same name 
> > > >> >> of the
> > > >> >> device in all hosts?
> > > >> >
> > > >> >
> > > >> > No, we do not, see below how we handle migration in os_brick.
> > > >> >
> > > >> >> To support VM migration, every device must have unique name in the 
> > > >> >> cluster.
> > > >> >> With multipath we always have unique name, since we disable 
> > > >> >> "friendly names",
> > > >> >> so we always have:
> > > >> >>
> > > >> >> /dev/mapper/{wwid}
> > > >> >>
> > > >> >> With rbd we also do not use /dev/rbdN but a unique path:
> > > >> >>
> > > >> >> /dev/rbd/poolname/volume-vol-id
> > > >> >>
> > > >> >> How do we ensure cluster-unique device path? If os_brick does not 
> > > >> >> handle it, we
> > > >> >> can to do in ovirt, for example:
> > > >> >>
> > > >> >> /run/vdsm/mangedvolumes/{uuid} -> /dev/nvme7n42
> > > >> >>
> > > >> >> but I think this should be handled in cinderlib, since openstack 
> > > >> >> have
> > > >> >> the same problem with migration.
> > > >> >
> > > >> >
> > > >> > Indeed. Both the Lightbits LightOS connector and the nvmeof 
> > > >> > connector do this through the target provided namespace (LUN) UUID. 
> > > >> > After connecting to the target, the connectors wait for the local 
> > > >> > friendly-named device file that has the right UUID to show up, and 
> > > >> > then return the friendly name. So different hosts will have 
> > > >> > different friendly names, but the VMs will be attached to the right 
> > > >> > namespace since we return the friendly name on the current host that 
> > > >> > has the right UUID. Does this also work for you?
> > > >>
> > > >> It will not work for oVirt.
> > > >>
> > > >> Migration in oVirt works like this:
> > > >>
> > > >> 1. Attach disks to destination host
> > > >> 2. Send VM XML from source host to destination host, and start the
> > > >>VM is paused mode
> > > >> 3. Start the migration on the source host
> > > >> 4. When migration is done, start the CPU on the destination host
> > > >> 5. Detach the disks from the source
> > > >>
> > > >> This will break in step 2, since the source xml refer to nvme device
> > > >> that does not exist or already used by another VM.
> > > >
> > > >
> > > > Indeed.
> > > >
> > > >> To make this work, the VM XML must use the same path, existing on
> > > >> both hosts.
> > > >>
> > > >> The issue can be solved by libvirt hook updating the paths before qemu
> > > >> is started on the destination, but I think the right way to handle 
> > > >> this is to
> > > >> have the same path.
> > > >
> > > >
> > > >  You mentioned above that it can be handled in ovirt (c.f., 
> > > > /run/vdsm/mangedvolumes/{uuid} -> /dev/nvme7n42), which seems like a 
> > > > reasonable approach given the constraint imposed by the oVirt migration 
> > > > flow you outlined above. What information does vdsm need to create and 
> > > > use the /var/run/vdsm/managedvolumes/{uuid} link? Today the connector 
> > > > does (trimmed for brevity):
> > > >
> > > > def connect_volume(self, connection_properties):
> > > > device_info = {'type': 'block'}
> > > > uuid = connection_properties['uuid']
> > > > device_path = self._get_device_by_uuid(uuid)
> > > > device_info['path'] = device_path
> > > > return device_info
> > >
> > > I think we have 2 options:
> > >
> > > 1. unique path created by os_brick using the underlying uuid
> > >
> > > In this case the connector will return the uuid, and ovirt will use
> > > it to resolve the unique path that will be stored and used on engine
> > > side 

[ovirt-devel] Re: [ovirt-users] Re: [=EXTERNAL=] Re: help using nvme/tcp storage with cinderlib and Managed Block Storage

2022-03-02 Thread Muli Ben-Yehuda
On Thu, Feb 24, 2022 at 6:28 PM Nir Soffer  wrote:

> On Thu, Feb 24, 2022 at 6:10 PM Muli Ben-Yehuda 
> wrote:
> >
> > On Thu, Feb 24, 2022 at 3:58 PM Nir Soffer  wrote:
> >>
> >> On Wed, Feb 23, 2022 at 6:24 PM Muli Ben-Yehuda 
> wrote:
> >> >
> >> > Thanks for the detailed instructions, Nir. I'm going to scrounge up
> some hardware.
> >> > By the way, if anyone else would like to work on NVMe/TCP support,
> for NVMe/TCP target you can either use Lightbits (talk to me offline for
> details) or use the upstream Linux NVMe/TCP target. Lightbits is a
> clustered storage system while upstream is a single target, but the client
> side should be close enough for vdsm/ovirt purposes.
> >>
> >> I played with NVMe/TCP a little bit, using qemu to create a virtual
> >> NVMe disk, and export
> >> it using the kernel on one VM, and consume it on another VM.
> >> https://futurewei-cloud.github.io/ARM-Datacenter/qemu/nvme-of-tcp-vms/
> >>
> >> One question about device naming - do we always get the same name of the
> >> device in all hosts?
> >
> >
> > No, we do not, see below how we handle migration in os_brick.
> >
> >> To support VM migration, every device must have unique name in the
> cluster.
> >> With multipath we always have unique name, since we disable "friendly
> names",
> >> so we always have:
> >>
> >> /dev/mapper/{wwid}
> >>
> >> With rbd we also do not use /dev/rbdN but a unique path:
> >>
> >> /dev/rbd/poolname/volume-vol-id
> >>
> >> How do we ensure cluster-unique device path? If os_brick does not
> handle it, we
> >> can to do in ovirt, for example:
> >>
> >> /run/vdsm/mangedvolumes/{uuid} -> /dev/nvme7n42
> >>
> >> but I think this should be handled in cinderlib, since openstack have
> >> the same problem with migration.
> >
> >
> > Indeed. Both the Lightbits LightOS connector and the nvmeof connector do
> this through the target provided namespace (LUN) UUID. After connecting to
> the target, the connectors wait for the local friendly-named device file
> that has the right UUID to show up, and then return the friendly name. So
> different hosts will have different friendly names, but the VMs will be
> attached to the right namespace since we return the friendly name on the
> current host that has the right UUID. Does this also work for you?
>
> It will not work for oVirt.
>
> Migration in oVirt works like this:
>
> 1. Attach disks to destination host
> 2. Send VM XML from source host to destination host, and start the
>VM is paused mode
> 3. Start the migration on the source host
> 4. When migration is done, start the CPU on the destination host
> 5. Detach the disks from the source
>
> This will break in step 2, since the source xml refer to nvme device
> that does not exist or already used by another VM.
>

Indeed.

To make this work, the VM XML must use the same path, existing on
> both hosts.
>
> The issue can be solved by libvirt hook updating the paths before qemu
> is started on the destination, but I think the right way to handle this is
> to
> have the same path.
>

 You mentioned above that it can be handled in ovirt (c.f.,
/run/vdsm/mangedvolumes/{uuid} -> /dev/nvme7n42), which seems like a
reasonable approach given the constraint imposed by the oVirt migration
flow you outlined above. What information does vdsm need to create and use
the /var/run/vdsm/managedvolumes/{uuid} link? Today the connector does
(trimmed for brevity):

def connect_volume(self, connection_properties):
device_info = {'type': 'block'}
uuid = connection_properties['uuid']
device_path = self._get_device_by_uuid(uuid)
device_info['path'] = device_path
return device_info

Cheers,
Muli

-- 


*Lightbits Labs**
*Lead the cloud-native data center
transformation by 
delivering *scalable *and *efficient *software
defined storage that is 
*easy *to consume.



*This message is sent in confidence for the addressee 
only.  It
may contain legally privileged information. The contents are not 
to be
disclosed to anyone other than the addressee. Unauthorized recipients 
are
requested to preserve this confidentiality, advise the sender 
immediately of
any error in transmission and delete the email from their 
systems.*

___
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/36NFBYAE6MW4CS6AVIQMCLYIRQ3UFY3F/


[ovirt-devel] Re: [ovirt-users] Re: [=EXTERNAL=] Re: help using nvme/tcp storage with cinderlib and Managed Block Storage

2022-03-02 Thread Gorka Eguileor
On 24/02, Nir Soffer wrote:
> On Wed, Feb 23, 2022 at 6:24 PM Muli Ben-Yehuda  
> wrote:
> >
> > Thanks for the detailed instructions, Nir. I'm going to scrounge up some 
> > hardware.
> > By the way, if anyone else would like to work on NVMe/TCP support, for 
> > NVMe/TCP target you can either use Lightbits (talk to me offline for 
> > details) or use the upstream Linux NVMe/TCP target. Lightbits is a 
> > clustered storage system while upstream is a single target, but the client 
> > side should be close enough for vdsm/ovirt purposes.
>
> I played with NVMe/TCP a little bit, using qemu to create a virtual
> NVMe disk, and export
> it using the kernel on one VM, and consume it on another VM.
> https://futurewei-cloud.github.io/ARM-Datacenter/qemu/nvme-of-tcp-vms/
>

Hi,

You can also use nvmetcli to create nvme-of devices using the kernel's
nvmet.

I haven't tested any cinder NVMe driver with cinderlib yet, but I'll
test it with the LVM driver and nvmet target, since I'm currently
working on improvements/fixes on both the nvmet target and the os-brick
connector.

I have played with both iSCSI and RDMA (using Soft-RoCE) as transport
protocols for NVMe-oF and they worked fine in OpenStack.

Something important to consider when thinking about making it enterprise
ready is that the NVMe-oF connector in os-brick doesn't currently
support any kind of multipathing: native (ANA) or using device mapper.
But it's something we'll be working on.

I'll let you know how the cinderlib testing goes, though I already know
that the LVM with nvmet has problems in the disconnection [1].

[1]: https://bugs.launchpad.net/os-brick/+bug/1961102

> One question about device naming - do we always get the same name of the
> device in all hosts?

Definitely not, depending on the transport protocol used and the
features enabled (such as multipathing), os-brick will return a
different path to the device.

In the case of nvme-of it will return devices like /dev/nvme0n1 ==> This
means controller 0 and namespace 1 in the nvme host system.

And the namespace 1 in the system can actually have a different
namespace id (for example 10).  Example from a test system using LVM and
a nvmet target variant I'm working on:

  $ sudo nvme list
  Node   SN Model   Namespace Usage 
Format   FW Rev
  -- -- --- - - 
 
  /dev/nvme0n1   9a9bd17b53e6725f   Linux   11  1.07  GB /   1.07  GB   
512   B +  0 B   4.18.0-2
  /dev/nvme0n2   9a9bd17b53e6725f   Linux   10  1.07  GB /   1.07  GB   
512   B +  0 B   4.18.0-2


>
> To support VM migration, every device must have unique name in the cluster.
> With multipath we always have unique name, since we disable "friendly names",
> so we always have:
>
> /dev/mapper/{wwid}
>
> With rbd we also do not use /dev/rbdN but a unique path:
>
> /dev/rbd/poolname/volume-vol-id
>
> How do we ensure cluster-unique device path? If os_brick does not handle it, 
> we
> can to do in ovirt, for example:
>
> /run/vdsm/mangedvolumes/{uuid} -> /dev/nvme7n42
>

os-brick will not handle this, but assuming udev rules are working
consistently in both migration systems (source and target) there will be
a symlink in /dev/disk/by-id that is formed using the NVMe UUID of the
volume.

In the example above we have:

  $ ls -l /dev/disk/by-id/nvme*
  lrwxrwxrwx. 1 root root 13 Feb 24 16:30 
/dev/disk/by-id/nvme-Linux_9a9bd17b53e6725f -> ../../nvme0n2
  lrwxrwxrwx. 1 root root 13 Feb 24 16:30 
/dev/disk/by-id/nvme-uuid.5310ef24-8301-4e38-a8b8-b61cd61d8b36 -> ../../nvme0n2
  lrwxrwxrwx. 1 root root 13 Feb 24 16:30 
/dev/disk/by-id/nvme-uuid.e31b8c9c-b943-430e-afa4-55a110341dcb -> ../../nvme0n1

The uuid may not be the volume uuid, it will depend on the cinder
driver, but we can find the uuid for the specific nvme device easily
enough:

  $ cat /sys/class/nvme/nvme0/nvme0n2/wwid
  uuid.5310ef24-8301-4e38-a8b8-b61cd61d8b36


> but I think this should be handled in cinderlib, since openstack have
> the same problem
> with migration.

OpenStack doesn't have that problem with migrations.

In OpenStack we don't care where the device appears, because nova knows
the volume id of the volume before calling os-brick to connect to it,
and then when os-brick returns the path it knows it belongs to that
specific volume.

Cheers,
Gorka.

>
> Nir
>
> >
> > Cheers,
> > Muli
> > --
> > Muli Ben-Yehuda
> > Co-Founder and Chief Scientist @ http://www.lightbitslabs.com
> > LightOS: The Special Storage Sauce For Your Cloud
> >
> >
> > On Wed, Feb 23, 2022 at 4:55 PM Nir Soffer  wrote:
> >>
> >> On Wed, Feb 23, 2022 at 4:20 PM Muli Ben-Yehuda  
> >> wrote:
> >> >
> >> > Thanks, Nir and Benny (nice to run into you again, Nir!). I'm a neophyte 
> >> > in ovirt and vdsm... What's the simplest way to set up a development 
> >> > environment? Is it possible to set up a "standalone" vdsm environment to 
> 

[ovirt-devel] Re: [ovirt-users] Re: [=EXTERNAL=] Re: help using nvme/tcp storage with cinderlib and Managed Block Storage

2022-03-02 Thread Muli Ben-Yehuda
Sounds almost too good to be true :-)
We are working on setting up a development environment. Once it's up we'll
give this patch a spin and report back, probably early next week.

Cheers,
Muli
--
Muli Ben-Yehuda
Co-Founder and Chief Scientist @ http://www.lightbitslabs.com
LightOS: The Special Storage Sauce For Your Cloud


On Wed, Feb 23, 2022 at 7:26 PM Benny Zlotnik  wrote:

> Switching to +devel
>
> Browsing the code a bit, I think a good starting point would be
>
> https://github.com/oVirt/vdsm/blob/500c035903dd35180d71c97791e0ce4356fb77ad/lib/vdsm/storage/managedvolume.py#L110
> -if volume_type not in ("rbd", "iscsi"):
> +if volume_type not in ("rbd", "iscsi", "lightos"):
>
> And it's possible that it would be all that's required to make it work
> on ovirt >=4.5
>
>
> https://github.com/oVirt/vdsm/blob/500c035903dd35180d71c97791e0ce4356fb77ad/lib/vdsm/storage/managedvolume.py#L239
> The path resolver might warn, but it looks like it would return
> /dev/nvme0n so the generated udev rule should come out correct
>
> I'm happy to assist if it turns out to be more challenging (and if not) :)
>
>

-- 


*Lightbits Labs**
*Lead the cloud-native data center
transformation by 
delivering *scalable *and *efficient *software
defined storage that is 
*easy *to consume.



*This message is sent in confidence for the addressee 
only.  It
may contain legally privileged information. The contents are not 
to be
disclosed to anyone other than the addressee. Unauthorized recipients 
are
requested to preserve this confidentiality, advise the sender 
immediately of
any error in transmission and delete the email from their 
systems.*

___
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/LWRHKQ77LV42FTDFFTKUZ4VPHIEHQ6QR/


[ovirt-devel] Re: [ovirt-users] Re: [=EXTERNAL=] Re: help using nvme/tcp storage with cinderlib and Managed Block Storage

2022-03-02 Thread Muli Ben-Yehuda
On Thu, Feb 24, 2022 at 3:58 PM Nir Soffer  wrote:

> On Wed, Feb 23, 2022 at 6:24 PM Muli Ben-Yehuda 
> wrote:
> >
> > Thanks for the detailed instructions, Nir. I'm going to scrounge up some
> hardware.
> > By the way, if anyone else would like to work on NVMe/TCP support, for
> NVMe/TCP target you can either use Lightbits (talk to me offline for
> details) or use the upstream Linux NVMe/TCP target. Lightbits is a
> clustered storage system while upstream is a single target, but the client
> side should be close enough for vdsm/ovirt purposes.
>
> I played with NVMe/TCP a little bit, using qemu to create a virtual
> NVMe disk, and export
> it using the kernel on one VM, and consume it on another VM.
> https://futurewei-cloud.github.io/ARM-Datacenter/qemu/nvme-of-tcp-vms/
>
> One question about device naming - do we always get the same name of the
> device in all hosts?
>

No, we do not, see below how we handle migration in os_brick.

To support VM migration, every device must have unique name in the cluster.
> With multipath we always have unique name, since we disable "friendly
> names",
> so we always have:
>
> /dev/mapper/{wwid}
>
> With rbd we also do not use /dev/rbdN but a unique path:
>
> /dev/rbd/poolname/volume-vol-id
>
> How do we ensure cluster-unique device path? If os_brick does not handle
> it, we
> can to do in ovirt, for example:
>
> /run/vdsm/mangedvolumes/{uuid} -> /dev/nvme7n42
>
> but I think this should be handled in cinderlib, since openstack have
> the same problem with migration.
>

Indeed. Both the Lightbits LightOS connector and the nvmeof connector do
this through the target provided namespace (LUN) UUID. After connecting to
the target, the connectors wait for the local friendly-named device file
that has the right UUID to show up, and then return the friendly name. So
different hosts will have different friendly names, but the VMs will be
attached to the right namespace since we return the friendly name on the
current host that has the right UUID. Does this also work for you?

Cheers,
Muli

-- 


*Lightbits Labs**
*Lead the cloud-native data center
transformation by 
delivering *scalable *and *efficient *software
defined storage that is 
*easy *to consume.



*This message is sent in confidence for the addressee 
only.  It
may contain legally privileged information. The contents are not 
to be
disclosed to anyone other than the addressee. Unauthorized recipients 
are
requested to preserve this confidentiality, advise the sender 
immediately of
any error in transmission and delete the email from their 
systems.*

___
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/2MLCZPIHAWCTSOJW67BMCNPWYERR7B5B/


[ovirt-devel] Re: [ovirt-users] Re: [=EXTERNAL=] Re: help using nvme/tcp storage with cinderlib and Managed Block Storage

2022-03-02 Thread Nir Soffer
On Fri, Feb 25, 2022 at 12:04 PM Gorka Eguileor  wrote:
>
> On 24/02, Nir Soffer wrote:
> > On Thu, Feb 24, 2022 at 8:46 PM Gorka Eguileor  wrote:
> > >
> > > On 24/02, Nir Soffer wrote:
> > > > On Thu, Feb 24, 2022 at 6:35 PM Muli Ben-Yehuda 
> > > >  wrote:
> > > > >
> > > > > On Thu, Feb 24, 2022 at 6:28 PM Nir Soffer  wrote:
> > > > >>
> > > > >> On Thu, Feb 24, 2022 at 6:10 PM Muli Ben-Yehuda 
> > > > >>  wrote:
> > > > >> >
> > > > >> > On Thu, Feb 24, 2022 at 3:58 PM Nir Soffer  
> > > > >> > wrote:
> > > > >> >>
> > > > >> >> On Wed, Feb 23, 2022 at 6:24 PM Muli Ben-Yehuda 
> > > > >> >>  wrote:
> > > > >> >> >
> > > > >> >> > Thanks for the detailed instructions, Nir. I'm going to 
> > > > >> >> > scrounge up some hardware.
> > > > >> >> > By the way, if anyone else would like to work on NVMe/TCP 
> > > > >> >> > support, for NVMe/TCP target you can either use Lightbits (talk 
> > > > >> >> > to me offline for details) or use the upstream Linux NVMe/TCP 
> > > > >> >> > target. Lightbits is a clustered storage system while upstream 
> > > > >> >> > is a single target, but the client side should be close enough 
> > > > >> >> > for vdsm/ovirt purposes.
> > > > >> >>
> > > > >> >> I played with NVMe/TCP a little bit, using qemu to create a 
> > > > >> >> virtual
> > > > >> >> NVMe disk, and export
> > > > >> >> it using the kernel on one VM, and consume it on another VM.
> > > > >> >> https://futurewei-cloud.github.io/ARM-Datacenter/qemu/nvme-of-tcp-vms/
> > > > >> >>
> > > > >> >> One question about device naming - do we always get the same name 
> > > > >> >> of the
> > > > >> >> device in all hosts?
> > > > >> >
> > > > >> >
> > > > >> > No, we do not, see below how we handle migration in os_brick.
> > > > >> >
> > > > >> >> To support VM migration, every device must have unique name in 
> > > > >> >> the cluster.
> > > > >> >> With multipath we always have unique name, since we disable 
> > > > >> >> "friendly names",
> > > > >> >> so we always have:
> > > > >> >>
> > > > >> >> /dev/mapper/{wwid}
> > > > >> >>
> > > > >> >> With rbd we also do not use /dev/rbdN but a unique path:
> > > > >> >>
> > > > >> >> /dev/rbd/poolname/volume-vol-id
> > > > >> >>
> > > > >> >> How do we ensure cluster-unique device path? If os_brick does not 
> > > > >> >> handle it, we
> > > > >> >> can to do in ovirt, for example:
> > > > >> >>
> > > > >> >> /run/vdsm/mangedvolumes/{uuid} -> /dev/nvme7n42
> > > > >> >>
> > > > >> >> but I think this should be handled in cinderlib, since openstack 
> > > > >> >> have
> > > > >> >> the same problem with migration.
> > > > >> >
> > > > >> >
> > > > >> > Indeed. Both the Lightbits LightOS connector and the nvmeof 
> > > > >> > connector do this through the target provided namespace (LUN) 
> > > > >> > UUID. After connecting to the target, the connectors wait for the 
> > > > >> > local friendly-named device file that has the right UUID to show 
> > > > >> > up, and then return the friendly name. So different hosts will 
> > > > >> > have different friendly names, but the VMs will be attached to the 
> > > > >> > right namespace since we return the friendly name on the current 
> > > > >> > host that has the right UUID. Does this also work for you?
> > > > >>
> > > > >> It will not work for oVirt.
> > > > >>
> > > > >> Migration in oVirt works like this:
> > > > >>
> > > > >> 1. Attach disks to destination host
> > > > >> 2. Send VM XML from source host to destination host, and start the
> > > > >>VM is paused mode
> > > > >> 3. Start the migration on the source host
> > > > >> 4. When migration is done, start the CPU on the destination host
> > > > >> 5. Detach the disks from the source
> > > > >>
> > > > >> This will break in step 2, since the source xml refer to nvme device
> > > > >> that does not exist or already used by another VM.
> > > > >
> > > > >
> > > > > Indeed.
> > > > >
> > > > >> To make this work, the VM XML must use the same path, existing on
> > > > >> both hosts.
> > > > >>
> > > > >> The issue can be solved by libvirt hook updating the paths before 
> > > > >> qemu
> > > > >> is started on the destination, but I think the right way to handle 
> > > > >> this is to
> > > > >> have the same path.
> > > > >
> > > > >
> > > > >  You mentioned above that it can be handled in ovirt (c.f., 
> > > > > /run/vdsm/mangedvolumes/{uuid} -> /dev/nvme7n42), which seems like a 
> > > > > reasonable approach given the constraint imposed by the oVirt 
> > > > > migration flow you outlined above. What information does vdsm need to 
> > > > > create and use the /var/run/vdsm/managedvolumes/{uuid} link? Today 
> > > > > the connector does (trimmed for brevity):
> > > > >
> > > > > def connect_volume(self, connection_properties):
> > > > > device_info = {'type': 'block'}
> > > > > uuid = connection_properties['uuid']
> > > > > device_path = self._get_device_by_uuid(uuid)
> > > > > device_info['path'] = device_path
> > > > >  

[ovirt-devel] Re: [ovirt-users] Re: [=EXTERNAL=] Re: help using nvme/tcp storage with cinderlib and Managed Block Storage

2022-03-02 Thread Benny Zlotnik
Hi,

Just by browsing the code, I can think of one issue in[1], as a result
of[2] where we only considered iscsi and rbd drivers, I suspect your
driver will go into this branch, based on the issue in the 4.3 logs I
went over:
backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/builder/vminfo/LibvirtVmXmlBuilder.java

} else if (managedBlockStorageDisk.getCinderVolumeDriver()
== CinderVolumeDriver.BLOCK) {
Map attachment =
(Map)
managedBlockStorageDisk.getDevice().get(DeviceInfoReturn.ATTACHMENT);
metadata = Map.of(
"GUID",
(String)attachment.get(DeviceInfoReturn.SCSI_WWN),
"managed", "true"

Which will make it go into the wrong branch in clientIF.py, appending
the empty GUID to /dev/mapper. Perhaps it is possible workaround it in
clientIF if you just want to try and get the VM started for now, by
checking if GUID is empty and deferring to:
   volPath = drive['path']

But as discussed in this thread, our attempt at constructing the
stable paths ourselves doesn't really scale. After further discussion
with Nir I started working on creating a link in vdsm in
managevolume.py#attach_volume to the path returned by the driver, and
engine will use our link to run the VMs.
This should simplify the code and resolve the live VM migration issue.
I had some preliminary success with this so I'll try to post the
patches soon


[1] 
https://github.com/oVirt/vdsm/blob/d957a06a4d988489c83da171fcd9cfd254b12ca4/lib/vdsm/clientIF.py#L462
[2] 
https://github.com/oVirt/ovirt-engine/blob/24530d17874e20581deee4b0e319146cdcacb8f5/backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/builder/vminfo/LibvirtVmXmlBuilder.java#L2424










On Tue, Mar 1, 2022 at 6:12 PM Muli Ben-Yehuda  wrote:
>
> Will this support require changes in ovirt-engine or just in vdsm? I have 
> started to look into vdsm's managedvolume.py and its tests and it seems like 
> adding support for LightOS there should be pretty simple (famous last 
> words...). Should this be enough or do you think it will require changes in 
> other parts of ovirt as well?
>
> Cheers,
> Muli
>
> On Mon, Feb 28, 2022 at 9:09 AM Nir Soffer  wrote:
>>
>> On Fri, Feb 25, 2022 at 12:04 PM Gorka Eguileor  wrote:
>> >
>> > On 24/02, Nir Soffer wrote:
>> > > On Thu, Feb 24, 2022 at 8:46 PM Gorka Eguileor  
>> > > wrote:
>> > > >
>> > > > On 24/02, Nir Soffer wrote:
>> > > > > On Thu, Feb 24, 2022 at 6:35 PM Muli Ben-Yehuda 
>> > > > >  wrote:
>> > > > > >
>> > > > > > On Thu, Feb 24, 2022 at 6:28 PM Nir Soffer  
>> > > > > > wrote:
>> > > > > >>
>> > > > > >> On Thu, Feb 24, 2022 at 6:10 PM Muli Ben-Yehuda 
>> > > > > >>  wrote:
>> > > > > >> >
>> > > > > >> > On Thu, Feb 24, 2022 at 3:58 PM Nir Soffer  
>> > > > > >> > wrote:
>> > > > > >> >>
>> > > > > >> >> On Wed, Feb 23, 2022 at 6:24 PM Muli Ben-Yehuda 
>> > > > > >> >>  wrote:
>> > > > > >> >> >
>> > > > > >> >> > Thanks for the detailed instructions, Nir. I'm going to 
>> > > > > >> >> > scrounge up some hardware.
>> > > > > >> >> > By the way, if anyone else would like to work on NVMe/TCP 
>> > > > > >> >> > support, for NVMe/TCP target you can either use Lightbits 
>> > > > > >> >> > (talk to me offline for details) or use the upstream Linux 
>> > > > > >> >> > NVMe/TCP target. Lightbits is a clustered storage system 
>> > > > > >> >> > while upstream is a single target, but the client side 
>> > > > > >> >> > should be close enough for vdsm/ovirt purposes.
>> > > > > >> >>
>> > > > > >> >> I played with NVMe/TCP a little bit, using qemu to create a 
>> > > > > >> >> virtual
>> > > > > >> >> NVMe disk, and export
>> > > > > >> >> it using the kernel on one VM, and consume it on another VM.
>> > > > > >> >> https://futurewei-cloud.github.io/ARM-Datacenter/qemu/nvme-of-tcp-vms/
>> > > > > >> >>
>> > > > > >> >> One question about device naming - do we always get the same 
>> > > > > >> >> name of the
>> > > > > >> >> device in all hosts?
>> > > > > >> >
>> > > > > >> >
>> > > > > >> > No, we do not, see below how we handle migration in os_brick.
>> > > > > >> >
>> > > > > >> >> To support VM migration, every device must have unique name in 
>> > > > > >> >> the cluster.
>> > > > > >> >> With multipath we always have unique name, since we disable 
>> > > > > >> >> "friendly names",
>> > > > > >> >> so we always have:
>> > > > > >> >>
>> > > > > >> >> /dev/mapper/{wwid}
>> > > > > >> >>
>> > > > > >> >> With rbd we also do not use /dev/rbdN but a unique path:
>> > > > > >> >>
>> > > > > >> >> /dev/rbd/poolname/volume-vol-id
>> > > > > >> >>
>> > > > > >> >> How do we ensure cluster-unique device path? If os_brick does 
>> > > > > >> >> not handle it, we
>> > > > > >> >> can to do in ovirt, for example:
>> > > > > >> >>
>> > > > > >> >> /run/vdsm/mangedvolumes/{uuid} -> /dev/nvme7n42
>> > > > > >> >>
>> > > > > >> >> but I think this should be handled 

[ovirt-devel] Re: [ovirt-users] Re: [=EXTERNAL=] Re: help using nvme/tcp storage with cinderlib and Managed Block Storage

2022-02-24 Thread Nir Soffer
On Thu, Feb 24, 2022 at 8:46 PM Gorka Eguileor  wrote:
>
> On 24/02, Nir Soffer wrote:
> > On Thu, Feb 24, 2022 at 6:35 PM Muli Ben-Yehuda  
> > wrote:
> > >
> > > On Thu, Feb 24, 2022 at 6:28 PM Nir Soffer  wrote:
> > >>
> > >> On Thu, Feb 24, 2022 at 6:10 PM Muli Ben-Yehuda  
> > >> wrote:
> > >> >
> > >> > On Thu, Feb 24, 2022 at 3:58 PM Nir Soffer  wrote:
> > >> >>
> > >> >> On Wed, Feb 23, 2022 at 6:24 PM Muli Ben-Yehuda 
> > >> >>  wrote:
> > >> >> >
> > >> >> > Thanks for the detailed instructions, Nir. I'm going to scrounge up 
> > >> >> > some hardware.
> > >> >> > By the way, if anyone else would like to work on NVMe/TCP support, 
> > >> >> > for NVMe/TCP target you can either use Lightbits (talk to me 
> > >> >> > offline for details) or use the upstream Linux NVMe/TCP target. 
> > >> >> > Lightbits is a clustered storage system while upstream is a single 
> > >> >> > target, but the client side should be close enough for vdsm/ovirt 
> > >> >> > purposes.
> > >> >>
> > >> >> I played with NVMe/TCP a little bit, using qemu to create a virtual
> > >> >> NVMe disk, and export
> > >> >> it using the kernel on one VM, and consume it on another VM.
> > >> >> https://futurewei-cloud.github.io/ARM-Datacenter/qemu/nvme-of-tcp-vms/
> > >> >>
> > >> >> One question about device naming - do we always get the same name of 
> > >> >> the
> > >> >> device in all hosts?
> > >> >
> > >> >
> > >> > No, we do not, see below how we handle migration in os_brick.
> > >> >
> > >> >> To support VM migration, every device must have unique name in the 
> > >> >> cluster.
> > >> >> With multipath we always have unique name, since we disable "friendly 
> > >> >> names",
> > >> >> so we always have:
> > >> >>
> > >> >> /dev/mapper/{wwid}
> > >> >>
> > >> >> With rbd we also do not use /dev/rbdN but a unique path:
> > >> >>
> > >> >> /dev/rbd/poolname/volume-vol-id
> > >> >>
> > >> >> How do we ensure cluster-unique device path? If os_brick does not 
> > >> >> handle it, we
> > >> >> can to do in ovirt, for example:
> > >> >>
> > >> >> /run/vdsm/mangedvolumes/{uuid} -> /dev/nvme7n42
> > >> >>
> > >> >> but I think this should be handled in cinderlib, since openstack have
> > >> >> the same problem with migration.
> > >> >
> > >> >
> > >> > Indeed. Both the Lightbits LightOS connector and the nvmeof connector 
> > >> > do this through the target provided namespace (LUN) UUID. After 
> > >> > connecting to the target, the connectors wait for the local 
> > >> > friendly-named device file that has the right UUID to show up, and 
> > >> > then return the friendly name. So different hosts will have different 
> > >> > friendly names, but the VMs will be attached to the right namespace 
> > >> > since we return the friendly name on the current host that has the 
> > >> > right UUID. Does this also work for you?
> > >>
> > >> It will not work for oVirt.
> > >>
> > >> Migration in oVirt works like this:
> > >>
> > >> 1. Attach disks to destination host
> > >> 2. Send VM XML from source host to destination host, and start the
> > >>VM is paused mode
> > >> 3. Start the migration on the source host
> > >> 4. When migration is done, start the CPU on the destination host
> > >> 5. Detach the disks from the source
> > >>
> > >> This will break in step 2, since the source xml refer to nvme device
> > >> that does not exist or already used by another VM.
> > >
> > >
> > > Indeed.
> > >
> > >> To make this work, the VM XML must use the same path, existing on
> > >> both hosts.
> > >>
> > >> The issue can be solved by libvirt hook updating the paths before qemu
> > >> is started on the destination, but I think the right way to handle this 
> > >> is to
> > >> have the same path.
> > >
> > >
> > >  You mentioned above that it can be handled in ovirt (c.f., 
> > > /run/vdsm/mangedvolumes/{uuid} -> /dev/nvme7n42), which seems like a 
> > > reasonable approach given the constraint imposed by the oVirt migration 
> > > flow you outlined above. What information does vdsm need to create and 
> > > use the /var/run/vdsm/managedvolumes/{uuid} link? Today the connector 
> > > does (trimmed for brevity):
> > >
> > > def connect_volume(self, connection_properties):
> > > device_info = {'type': 'block'}
> > > uuid = connection_properties['uuid']
> > > device_path = self._get_device_by_uuid(uuid)
> > > device_info['path'] = device_path
> > > return device_info
> >
> > I think we have 2 options:
> >
> > 1. unique path created by os_brick using the underlying uuid
> >
> > In this case the connector will return the uuid, and ovirt will use
> > it to resolve the unique path that will be stored and used on engine
> > side to create the vm xml.
> >
> > I'm not sure how the connector should return this uuid. Looking in current
> > vdsm code:
> >
> > if vol_type in ("iscsi", "fibre_channel"):
> > if "multipath_id" not in attachment:
> > raise 

[ovirt-devel] Re: [ovirt-users] Re: [=EXTERNAL=] Re: help using nvme/tcp storage with cinderlib and Managed Block Storage

2022-02-24 Thread Nir Soffer
On Thu, Feb 24, 2022 at 6:35 PM Muli Ben-Yehuda  wrote:
>
> On Thu, Feb 24, 2022 at 6:28 PM Nir Soffer  wrote:
>>
>> On Thu, Feb 24, 2022 at 6:10 PM Muli Ben-Yehuda  
>> wrote:
>> >
>> > On Thu, Feb 24, 2022 at 3:58 PM Nir Soffer  wrote:
>> >>
>> >> On Wed, Feb 23, 2022 at 6:24 PM Muli Ben-Yehuda  
>> >> wrote:
>> >> >
>> >> > Thanks for the detailed instructions, Nir. I'm going to scrounge up 
>> >> > some hardware.
>> >> > By the way, if anyone else would like to work on NVMe/TCP support, for 
>> >> > NVMe/TCP target you can either use Lightbits (talk to me offline for 
>> >> > details) or use the upstream Linux NVMe/TCP target. Lightbits is a 
>> >> > clustered storage system while upstream is a single target, but the 
>> >> > client side should be close enough for vdsm/ovirt purposes.
>> >>
>> >> I played with NVMe/TCP a little bit, using qemu to create a virtual
>> >> NVMe disk, and export
>> >> it using the kernel on one VM, and consume it on another VM.
>> >> https://futurewei-cloud.github.io/ARM-Datacenter/qemu/nvme-of-tcp-vms/
>> >>
>> >> One question about device naming - do we always get the same name of the
>> >> device in all hosts?
>> >
>> >
>> > No, we do not, see below how we handle migration in os_brick.
>> >
>> >> To support VM migration, every device must have unique name in the 
>> >> cluster.
>> >> With multipath we always have unique name, since we disable "friendly 
>> >> names",
>> >> so we always have:
>> >>
>> >> /dev/mapper/{wwid}
>> >>
>> >> With rbd we also do not use /dev/rbdN but a unique path:
>> >>
>> >> /dev/rbd/poolname/volume-vol-id
>> >>
>> >> How do we ensure cluster-unique device path? If os_brick does not handle 
>> >> it, we
>> >> can to do in ovirt, for example:
>> >>
>> >> /run/vdsm/mangedvolumes/{uuid} -> /dev/nvme7n42
>> >>
>> >> but I think this should be handled in cinderlib, since openstack have
>> >> the same problem with migration.
>> >
>> >
>> > Indeed. Both the Lightbits LightOS connector and the nvmeof connector do 
>> > this through the target provided namespace (LUN) UUID. After connecting to 
>> > the target, the connectors wait for the local friendly-named device file 
>> > that has the right UUID to show up, and then return the friendly name. So 
>> > different hosts will have different friendly names, but the VMs will be 
>> > attached to the right namespace since we return the friendly name on the 
>> > current host that has the right UUID. Does this also work for you?
>>
>> It will not work for oVirt.
>>
>> Migration in oVirt works like this:
>>
>> 1. Attach disks to destination host
>> 2. Send VM XML from source host to destination host, and start the
>>VM is paused mode
>> 3. Start the migration on the source host
>> 4. When migration is done, start the CPU on the destination host
>> 5. Detach the disks from the source
>>
>> This will break in step 2, since the source xml refer to nvme device
>> that does not exist or already used by another VM.
>
>
> Indeed.
>
>> To make this work, the VM XML must use the same path, existing on
>> both hosts.
>>
>> The issue can be solved by libvirt hook updating the paths before qemu
>> is started on the destination, but I think the right way to handle this is to
>> have the same path.
>
>
>  You mentioned above that it can be handled in ovirt (c.f., 
> /run/vdsm/mangedvolumes/{uuid} -> /dev/nvme7n42), which seems like a 
> reasonable approach given the constraint imposed by the oVirt migration flow 
> you outlined above. What information does vdsm need to create and use the 
> /var/run/vdsm/managedvolumes/{uuid} link? Today the connector does (trimmed 
> for brevity):
>
> def connect_volume(self, connection_properties):
> device_info = {'type': 'block'}
> uuid = connection_properties['uuid']
> device_path = self._get_device_by_uuid(uuid)
> device_info['path'] = device_path
> return device_info

I think we have 2 options:

1. unique path created by os_brick using the underlying uuid

In this case the connector will return the uuid, and ovirt will use
it to resolve the unique path that will be stored and used on engine
side to create the vm xml.

I'm not sure how the connector should return this uuid. Looking in current
vdsm code:

if vol_type in ("iscsi", "fibre_channel"):
if "multipath_id" not in attachment:
raise se.ManagedVolumeUnsupportedDevice(vol_id, attachment)
# /dev/mapper/xxxyyy
return os.path.join(DEV_MAPPER, attachment["multipath_id"])
elif vol_type == "rbd":
# /dev/rbd/poolname/volume-vol-id
return os.path.join(DEV_RBD, connection_info['data']['name'])

os_brick does not have a uniform way to address different devices.

Maybe Gorka can help with this.

2. unique path created by oVirt

In this case oVirt will use the disk uuid already used in
ManagedVolume.{attach,detach}_volume APIs:

[ovirt-devel] Re: [ovirt-users] Re: [=EXTERNAL=] Re: help using nvme/tcp storage with cinderlib and Managed Block Storage

2022-02-24 Thread Nir Soffer
On Thu, Feb 24, 2022 at 6:10 PM Muli Ben-Yehuda  wrote:
>
> On Thu, Feb 24, 2022 at 3:58 PM Nir Soffer  wrote:
>>
>> On Wed, Feb 23, 2022 at 6:24 PM Muli Ben-Yehuda  
>> wrote:
>> >
>> > Thanks for the detailed instructions, Nir. I'm going to scrounge up some 
>> > hardware.
>> > By the way, if anyone else would like to work on NVMe/TCP support, for 
>> > NVMe/TCP target you can either use Lightbits (talk to me offline for 
>> > details) or use the upstream Linux NVMe/TCP target. Lightbits is a 
>> > clustered storage system while upstream is a single target, but the client 
>> > side should be close enough for vdsm/ovirt purposes.
>>
>> I played with NVMe/TCP a little bit, using qemu to create a virtual
>> NVMe disk, and export
>> it using the kernel on one VM, and consume it on another VM.
>> https://futurewei-cloud.github.io/ARM-Datacenter/qemu/nvme-of-tcp-vms/
>>
>> One question about device naming - do we always get the same name of the
>> device in all hosts?
>
>
> No, we do not, see below how we handle migration in os_brick.
>
>> To support VM migration, every device must have unique name in the cluster.
>> With multipath we always have unique name, since we disable "friendly names",
>> so we always have:
>>
>> /dev/mapper/{wwid}
>>
>> With rbd we also do not use /dev/rbdN but a unique path:
>>
>> /dev/rbd/poolname/volume-vol-id
>>
>> How do we ensure cluster-unique device path? If os_brick does not handle it, 
>> we
>> can to do in ovirt, for example:
>>
>> /run/vdsm/mangedvolumes/{uuid} -> /dev/nvme7n42
>>
>> but I think this should be handled in cinderlib, since openstack have
>> the same problem with migration.
>
>
> Indeed. Both the Lightbits LightOS connector and the nvmeof connector do this 
> through the target provided namespace (LUN) UUID. After connecting to the 
> target, the connectors wait for the local friendly-named device file that has 
> the right UUID to show up, and then return the friendly name. So different 
> hosts will have different friendly names, but the VMs will be attached to the 
> right namespace since we return the friendly name on the current host that 
> has the right UUID. Does this also work for you?

It will not work for oVirt.

Migration in oVirt works like this:

1. Attach disks to destination host
2. Send VM XML from source host to destination host, and start the
   VM is paused mode
3. Start the migration on the source host
4. When migration is done, start the CPU on the destination host
5. Detach the disks from the source

This will break in step 2, since the source xml refer to nvme device
that does not exist or already used by another VM.

To make this work, the VM XML must use the same path, existing on
both hosts.

The issue can be solved by libvirt hook updating the paths before qemu
is started on the destination, but I think the right way to handle this is to
have the same path.

Nir
___
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/PHRJI6SL6AFWTVXHLFH3RS4W4E3CE3GK/


[ovirt-devel] Re: [ovirt-users] Re: [=EXTERNAL=] Re: help using nvme/tcp storage with cinderlib and Managed Block Storage

2022-02-24 Thread Nir Soffer
On Wed, Feb 23, 2022 at 6:24 PM Muli Ben-Yehuda  wrote:
>
> Thanks for the detailed instructions, Nir. I'm going to scrounge up some 
> hardware.
> By the way, if anyone else would like to work on NVMe/TCP support, for 
> NVMe/TCP target you can either use Lightbits (talk to me offline for details) 
> or use the upstream Linux NVMe/TCP target. Lightbits is a clustered storage 
> system while upstream is a single target, but the client side should be close 
> enough for vdsm/ovirt purposes.

I played with NVMe/TCP a little bit, using qemu to create a virtual
NVMe disk, and export
it using the kernel on one VM, and consume it on another VM.
https://futurewei-cloud.github.io/ARM-Datacenter/qemu/nvme-of-tcp-vms/

One question about device naming - do we always get the same name of the
device in all hosts?

To support VM migration, every device must have unique name in the cluster.
With multipath we always have unique name, since we disable "friendly names",
so we always have:

/dev/mapper/{wwid}

With rbd we also do not use /dev/rbdN but a unique path:

/dev/rbd/poolname/volume-vol-id

How do we ensure cluster-unique device path? If os_brick does not handle it, we
can to do in ovirt, for example:

/run/vdsm/mangedvolumes/{uuid} -> /dev/nvme7n42

but I think this should be handled in cinderlib, since openstack have
the same problem
with migration.

Nir

>
> Cheers,
> Muli
> --
> Muli Ben-Yehuda
> Co-Founder and Chief Scientist @ http://www.lightbitslabs.com
> LightOS: The Special Storage Sauce For Your Cloud
>
>
> On Wed, Feb 23, 2022 at 4:55 PM Nir Soffer  wrote:
>>
>> On Wed, Feb 23, 2022 at 4:20 PM Muli Ben-Yehuda  
>> wrote:
>> >
>> > Thanks, Nir and Benny (nice to run into you again, Nir!). I'm a neophyte 
>> > in ovirt and vdsm... What's the simplest way to set up a development 
>> > environment? Is it possible to set up a "standalone" vdsm environment to 
>> > hack support for nvme/tcp or do I need "full ovirt" to make it work?
>>
>> It should be possible to install vdsm on a single host or vm, and use vdsm
>> API to bring the host to the right state, and then attach devices and run
>> vms. But I don't know anyone that can pull this out since simulating what
>> engine is doing is hard.
>>
>> So the best way is to set up at least one host and engine host using the
>> latest 4.5 rpms, and continue from there. Once you have a host, building
>> vdsm on the host and upgrading the rpms is pretty easy.
>>
>> My preferred setup is to create vms using virt-manager for hosts, engine
>> and storage and run all the vms on my laptop.
>>
>> Note that you must have some traditional storage (NFS/iSCSI) to bring up
>> the system even if you plan to use only managed block storage (MBS).
>> Unfortunately when we add MBS support we did have time to fix the huge
>> technical debt so you still need a master storage domain using one of the
>> traditional legacy options.
>>
>> To build a setup, you can use:
>>
>> - engine vm: 6g ram, 2 cpus, centos stream 8
>> - hosts vm: 4g ram, 2 cpus, centos stream 8
>>   you can start with one host and add more hosts later if you want to
>> test migration.
>> - storage vm: 2g ram, 2 cpus, any os you like, I use alpine since it
>> takes very little
>>   memory and its NFS server is fast.
>>
>> See vdsm README for instructions how to setup a host:
>> https://github.com/oVirt/vdsm#manual-installation
>>
>> For engine host you can follow:
>> https://ovirt.org/documentation/installing_ovirt_as_a_self-hosted_engine_using_the_command_line/#Enabling_the_Red_Hat_Virtualization_Manager_Repositories_install_RHVM
>>
>> And after that this should work:
>>
>> dnf install ovirt-engine
>> engine-setup
>>
>> Accepting all the defaults should work.
>>
>> When you have engine running, you can add a new host with
>> the ip address or dns name of you host(s) vm, and engine will
>> do everything for you. Note that you must install the ovirt-release-master
>> rpm on the host before you add it to engine.
>>
>> Nir
>>
>> >
>> > Cheers,
>> > Muli
>> > --
>> > Muli Ben-Yehuda
>> > Co-Founder and Chief Scientist @ http://www.lightbitslabs.com
>> > LightOS: The Special Storage Sauce For Your Cloud
>> >
>> >
>> > On Wed, Feb 23, 2022 at 4:16 PM Nir Soffer  wrote:
>> >>
>> >> On Wed, Feb 23, 2022 at 2:48 PM Benny Zlotnik  wrote:
>> >> >
>> >> > So I started looking in the logs and tried to follow along with the
>> >> > code, but things didn't make sense and then I saw it's ovirt 4.3 which
>> >> > makes things more complicated :)
>> >> > Unfortunately because GUID is sent in the metadata the volume is
>> >> > treated as a vdsm managed volume[2] for the udev rule generation and
>> >> > it prepends the /dev/mapper prefix to an empty string as a result.
>> >> > I don't have the vdsm logs, so I am not sure where exactly this fails,
>> >> > but if it's after [4] it may be possible to workaround it with a vdsm
>> >> > hook
>> >> >
>> >> > In 4.4.6 we moved the udev rule triggering the volume mapping phase,