Re: [openstack-dev] [cinder][nova][os-brick] Testing for proposed iSCSI OS-Brick code

2017-06-01 Thread Gorka Eguileor
On 31/05, Matt Riedemann wrote:
> On 5/31/2017 6:58 AM, Gorka Eguileor wrote:
> > Hi,
> >
> > As some of you may know I've been working on improving iSCSI connections
> > on OpenStack to make them more robust and prevent them from leaving
> > leftovers on attach/detach operations.
> >
> > There are a couple of posts [1][2] going in more detail, but a good
> > summary would be that to fix this issue we require a considerable rework
> > in OS-Brick, changes in Open iSCSI, Cinder, Nova and specific tests.
> >
> > Relevant changes for those projects are:
> >
> > - Open iSCSI: iscsid behavior is not a perfect fit for the OpenStack use
> >case, so a new feature was added to disable automatic scans that added
> >unintended devices to the systems.  Done and merged [3][4], it will be
> >available on RHEL with iscsi-initiator-utils-6.2.0.874-2.el7
> >
> > - OS-Brick: rework iSCSI to make it robust on unreliable networks, to
> >add a `force` detach option that prioritizes leaving a clean system
> >over possible data loss, and to support the new Open iSCSI feature.
> >Done and pending review [5][6][7]
> >
> > - Cinder: Handle some attach/detach errors a little better and add
> >support to the force detach option for some operations where data loss
> >on error is acceptable, ie: create volume from image, restore backup,
> >etc. Done and pending review [8][9]
> >
> > - Nova: I haven't looked into the code here, but I'm sure there will be
> >cases where using the force detach operation will be useful.
> >
> > - Tests: While we do have tempest tests that verify that attach/detach
> >operations work both in Nova and in cinder volume creation operations,
> >they are not meant to test the robustness of the system, so new tests
> >will be required to validate the code.  Done [10]
> >
> > Proposed tests are simplified versions of the ones I used to validate
> > the code; but hey, at least these are somewhat readable ;-)
> > Unfortunately they are not in line with the tempest mission since they
> > are not meant to be run in a production environment due to their
> > disruptive nature while injecting errors.  They need to be run
> > sequentially and without any other operations running on the deployment.
> > They also run sudo commands via local bash or SSH for the verification
> > and error generation bits.
> >
> > We are testing create volume from image and attaching a volume to an
> > instance under the following networking error scenarios:
> >
> >   - No errors
> >   - All paths have 10% incoming packets dropped
> >   - All paths have 20% incoming packets dropped
> >   - All paths have 100% incoming packets dropped
> >   - Half the paths have 20% incoming packets dropped
> >   - The other half of the paths have 20% incoming packets dropped
> >   - Half the paths have 100% incoming packets dropped
> >   - The other half of the paths have 100% incoming packets dropped
> >
> > There are single execution versions as well as 10 consecutive operations
> > variants.
> >
> > Since these are big changes I'm sure we would all feel a lot more
> > confident to merge them if storage vendors would run the new tests to
> > confirm that there are no issues with their backends.
> >
> > Unfortunately to fully test the solution you may need to build the
> > latest Open-iSCSI package and install it in the system, then you can
> > just use an all-in-one DevStack with a couple of changes in the local.conf:
> >
> > enable_service tempest
> >
> > CINDER_REPO=https://review.openstack.org/p/openstack/cinder
> > CINDER_BRANCH=refs/changes/45/469445/1
> >
> > LIBS_FROM_GIT=os-brick
> >
> > OS_BRICK_REPO=https://review.openstack.org/p/openstack/os-brick
> > OS_BRICK_BRANCH=refs/changes/94/455394/11
> >
> > [[post-config|$CINDER_CONF]]
> > [multipath-backend]
> > use_multipath_for_image_xfer=true
> >
> > [[post-config|$NOVA_CONF]]
> > [libvirt]
> > volume_use_multipath = True
> >
> > [[post-config|$KEYSTONE_CONF]]
> > [token]
> > expiration = 14400
> >
> > [[test-config|$TEMPEST_CONFIG]]
> > [volume-feature-enabled]
> > multipath = True
> > [volume]
> > build_interval = 10
> > multipath_type = $MULTIPATH_VOLUME_TYPE
> > backend_protocol_tcp_port = 3260
> > multipath_backend_addresses = $STORAGE_BACKEND_IP1,$STORAGE_BACKEND_IP2
> >
> > Multinode configurations are also supported using SSH with use/password or
> > private key to introduce the errors or check that the systems didn't leave 
> > any
> > leftovers, the tests can also run a cleanup command between tests, etc., but
> > that's beyond the scope of this email.
> >
> > Then you can run them all from /opt/stack/tempest with:
> >
> >   $ cd /opt/stack/tempest
> >   $ OS_TEST_TIMEOUT=7200 ostestr -r 
> > cinder.tests.tempest.scenario.test_multipath.*
> >
> > But I would recommend first running the simplest one without errors and
> > manually checking that the 

Re: [openstack-dev] [cinder][nova][os-brick] Testing for proposed iSCSI OS-Brick code

2017-05-31 Thread Matt Riedemann

On 5/31/2017 6:58 AM, Gorka Eguileor wrote:

Hi,

As some of you may know I've been working on improving iSCSI connections
on OpenStack to make them more robust and prevent them from leaving
leftovers on attach/detach operations.

There are a couple of posts [1][2] going in more detail, but a good
summary would be that to fix this issue we require a considerable rework
in OS-Brick, changes in Open iSCSI, Cinder, Nova and specific tests.

Relevant changes for those projects are:

- Open iSCSI: iscsid behavior is not a perfect fit for the OpenStack use
   case, so a new feature was added to disable automatic scans that added
   unintended devices to the systems.  Done and merged [3][4], it will be
   available on RHEL with iscsi-initiator-utils-6.2.0.874-2.el7

- OS-Brick: rework iSCSI to make it robust on unreliable networks, to
   add a `force` detach option that prioritizes leaving a clean system
   over possible data loss, and to support the new Open iSCSI feature.
   Done and pending review [5][6][7]

- Cinder: Handle some attach/detach errors a little better and add
   support to the force detach option for some operations where data loss
   on error is acceptable, ie: create volume from image, restore backup,
   etc. Done and pending review [8][9]

- Nova: I haven't looked into the code here, but I'm sure there will be
   cases where using the force detach operation will be useful.

- Tests: While we do have tempest tests that verify that attach/detach
   operations work both in Nova and in cinder volume creation operations,
   they are not meant to test the robustness of the system, so new tests
   will be required to validate the code.  Done [10]

Proposed tests are simplified versions of the ones I used to validate
the code; but hey, at least these are somewhat readable ;-)
Unfortunately they are not in line with the tempest mission since they
are not meant to be run in a production environment due to their
disruptive nature while injecting errors.  They need to be run
sequentially and without any other operations running on the deployment.
They also run sudo commands via local bash or SSH for the verification
and error generation bits.

We are testing create volume from image and attaching a volume to an
instance under the following networking error scenarios:

  - No errors
  - All paths have 10% incoming packets dropped
  - All paths have 20% incoming packets dropped
  - All paths have 100% incoming packets dropped
  - Half the paths have 20% incoming packets dropped
  - The other half of the paths have 20% incoming packets dropped
  - Half the paths have 100% incoming packets dropped
  - The other half of the paths have 100% incoming packets dropped

There are single execution versions as well as 10 consecutive operations
variants.

Since these are big changes I'm sure we would all feel a lot more
confident to merge them if storage vendors would run the new tests to
confirm that there are no issues with their backends.

Unfortunately to fully test the solution you may need to build the
latest Open-iSCSI package and install it in the system, then you can
just use an all-in-one DevStack with a couple of changes in the local.conf:

enable_service tempest

CINDER_REPO=https://review.openstack.org/p/openstack/cinder
CINDER_BRANCH=refs/changes/45/469445/1

LIBS_FROM_GIT=os-brick

OS_BRICK_REPO=https://review.openstack.org/p/openstack/os-brick
OS_BRICK_BRANCH=refs/changes/94/455394/11

[[post-config|$CINDER_CONF]]
[multipath-backend]
use_multipath_for_image_xfer=true

[[post-config|$NOVA_CONF]]
[libvirt]
volume_use_multipath = True

[[post-config|$KEYSTONE_CONF]]
[token]
expiration = 14400

[[test-config|$TEMPEST_CONFIG]]
[volume-feature-enabled]
multipath = True
[volume]
build_interval = 10
multipath_type = $MULTIPATH_VOLUME_TYPE
backend_protocol_tcp_port = 3260
multipath_backend_addresses = $STORAGE_BACKEND_IP1,$STORAGE_BACKEND_IP2

Multinode configurations are also supported using SSH with use/password or
private key to introduce the errors or check that the systems didn't leave any
leftovers, the tests can also run a cleanup command between tests, etc., but
that's beyond the scope of this email.

Then you can run them all from /opt/stack/tempest with:

  $ cd /opt/stack/tempest
  $ OS_TEST_TIMEOUT=7200 ostestr -r 
cinder.tests.tempest.scenario.test_multipath.*

But I would recommend first running the simplest one without errors and
manually checking that the multipath is being created.

  $ ostestr -n 
cinder.tests.tempest.scenario.test_multipath.TestMultipath.test_create_volume_with_errors_1

Then doing the same with one with errors and verify the presence of the
filters in iptables and that the packet drop for those filters is non zero:

  $ ostestr -n 
cinder.tests.tempest.scenario.test_multipath.TestMultipath.test_create_volume_with_errors_2
  $ sudo iptables -nvL INPUT

Then doing the same with a