Re: [openstack-dev] [cinder] Taskflow 0.10.0 incompatible with NetApp NFS drivers

2015-05-08 Thread Joshua Harlow

Will release that when I get in to work this morning,

Needed to sleep ;)

-Josh

Kerr, Andrew wrote:

The problem is in the version of taskflow that is downloaded from pypi
by devstack. You will need to wait until a new version 0.10.0 is
available [1]

[1] https://pypi.python.org/pypi/taskflow/

Andrew Kerr
OpenStack QA
Cloud Solutions Group
NetApp

From: Bharat Kumar bharat.kobag...@redhat.com
mailto:bharat.kobag...@redhat.com
Reply-To: OpenStack Development Mailing List (not for usage questions)
openstack-dev@lists.openstack.org
mailto:openstack-dev@lists.openstack.org
Date: Friday, May 8, 2015 at 7:37 AM
To: openstack-dev@lists.openstack.org
mailto:openstack-dev@lists.openstack.org
openstack-dev@lists.openstack.org
mailto:openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [cinder] Taskflow 0.10.0 incompatible with
NetApp NFS drivers

GlusterFS CI job is still failing with the same issue.

I gave couple of rechecks on [1], after
https://review.openstack.org/#/c/181288/ patch got merged.

But still GlusterFS CI job is failing with below error [2]:
ObjectDereferencedError: Can't emit change event for attribute
'Volume.provider_location' - parent object of type Volume has been
garbage collected.

Also I found the same behaviour with NetApp CI also.


[1] https://review.openstack.org/#/c/165424/
[2]
http://logs.openstack.org/24/165424/6/check/check-tempest-dsvm-full-glusterfs-nv/f386477/logs/screen-c-vol.txt.gz


On 05/08/2015 10:21 AM, Joshua Harlow wrote:

Alright, it was as I had a hunch for, a small bug found in the new
algorithm to make the storage layer
copy-original,mutate-copy,save-copy,update-original (vs
update-original,save-original) more reliable.

https://bugs.launchpad.net/taskflow/+bug/1452978 opened and a one line
fix made @ https://review.openstack.org/#/c/181288/ to stop trying to
copy task results (which was activating logic that must of caused the
reference to drop out of existence and therefore the issue noted below).

Will get that released in 0.10.1 once it flushes through the pipeline.

Thanks alex for helping double check, if others want to check to
that'd be nice, can make sure that's the root cause (overzealous usage
of copy.copy, ha).

Overall I'd still *highly* recommend that the following still happen:

 One way to get around whatever the issue is would be to change the
 drivers to not update the object directly as it is not needed. But
 this should not fail. Perhaps a more proper fix is for the volume
 manager to not pass around sqlalchemy objects.

But that can be a later tweak that cinder does; using any taskflow
engine that isn't the greenthreaded/threaded/serial engine will
require results to be serializable, and therefore copyable, so that
those results can go across IPC or MQ/other boundaries. Sqlalchemy
objects won't fit either of these cases (obviously).

-Josh

Joshua Harlow wrote:

Are we sure this is taskflow? I'm wondering since those errors are more
from task code (which is in cinder) and the following seems to be a
general garbage collection issue (not connected to taskflow?):

'Exception during message handling: Can't emit change event for
attribute 'Volume.provider_location' - parent object of type Volume
has been garbage collected.'''

Or:

'''2015-05-07 22:42:51.142 17040 TRACE oslo_messaging.rpc.dispatcher
ObjectDereferencedError: Can't emit change event for attribute
'Volume.provider_location' - parent object of type Volume has been
garbage collected.'''

Alex Meade wrote:

So it seems that this will break a number of drivers, I see that
glusterfs does the same thing.

On Thu, May 7, 2015 at 10:29 PM, Alex Meade mr.alex.me...@gmail.com
mailto:mr.alex.me...@gmail.com wrote:

It appears that the release of taskflow 0.10.0 exposed an issue in
the NetApp NFS drivers. Something changed that caused the sqlalchemy
Volume object to be garbage collected even though it is passed into
create_volume()

An example error can be found in the c-vol logs here:

http://dcf901611175aa43f968-c54047c910227e27e1d6f03bb1796fd7.r95.cf5.rackcdn.com/57/181157/1/check/cinder-cDOT-NFS/0473c54/


One way to get around whatever the issue is would be to change the
drivers to not update the object directly as it is not needed. But
this should not fail. Perhaps a more proper fix is for the volume
manager to not pass around sqlalchemy objects.


+1



Something changed in taskflow, however, and we should just
understand if that has other impact.


I'd like to understand that also: the only one commit that touched this
stuff is https://github.com/openstack/taskflow/commit/227cf52 (which
basically ensured that a storage object copy is modified, then saved,
then the local object is updated vs updating the local object, and then
saving, which has problems/inconsistencies if the save fails).



-Alex


__


OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ

Re: [openstack-dev] [cinder] Taskflow 0.10.0 incompatible with NetApp NFS drivers

2015-05-08 Thread Bharat Kumar

GlusterFS CI job is still failing with the same issue.

I gave couple of rechecks on [1], after 
https://review.openstack.org/#/c/181288/ patch got merged.


But still GlusterFS CI job is failing with below error [2]:
ObjectDereferencedError: Can't emit change event for attribute 
'Volume.provider_location' - parent object of type Volume has been 
garbage collected.


Also I found the same behaviour with NetApp CI also.


[1] https://review.openstack.org/#/c/165424/
[2] 
http://logs.openstack.org/24/165424/6/check/check-tempest-dsvm-full-glusterfs-nv/f386477/logs/screen-c-vol.txt.gz



On 05/08/2015 10:21 AM, Joshua Harlow wrote:
Alright, it was as I had a hunch for, a small bug found in the new 
algorithm to make the storage layer 
copy-original,mutate-copy,save-copy,update-original (vs 
update-original,save-original) more reliable.


https://bugs.launchpad.net/taskflow/+bug/1452978 opened and a one line 
fix made @ https://review.openstack.org/#/c/181288/ to stop trying to 
copy task results (which was activating logic that must of caused the 
reference to drop out of existence and therefore the issue noted below).


Will get that released in 0.10.1 once it flushes through the pipeline.

Thanks alex for helping double check, if others want to check to 
that'd be nice, can make sure that's the root cause (overzealous usage 
of copy.copy, ha).


Overall I'd still *highly* recommend that the following still happen:

 One way to get around whatever the issue is would be to change the
 drivers to not update the object directly as it is not needed. But
 this should not fail. Perhaps a more proper fix is for the volume
 manager to not pass around sqlalchemy objects.

But that can be a later tweak that cinder does; using any taskflow 
engine that isn't the greenthreaded/threaded/serial engine will 
require results to be serializable, and therefore copyable, so that 
those results can go across IPC or MQ/other boundaries. Sqlalchemy 
objects won't fit either of these cases (obviously).


-Josh

Joshua Harlow wrote:

Are we sure this is taskflow? I'm wondering since those errors are more
from task code (which is in cinder) and the following seems to be a
general garbage collection issue (not connected to taskflow?):

'Exception during message handling: Can't emit change event for
attribute 'Volume.provider_location' - parent object of type Volume
has been garbage collected.'''

Or:

'''2015-05-07 22:42:51.142 17040 TRACE oslo_messaging.rpc.dispatcher
ObjectDereferencedError: Can't emit change event for attribute
'Volume.provider_location' - parent object of type Volume has been
garbage collected.'''

Alex Meade wrote:

So it seems that this will break a number of drivers, I see that
glusterfs does the same thing.

On Thu, May 7, 2015 at 10:29 PM, Alex Meade mr.alex.me...@gmail.com
mailto:mr.alex.me...@gmail.com wrote:

It appears that the release of taskflow 0.10.0 exposed an issue in
the NetApp NFS drivers. Something changed that caused the sqlalchemy
Volume object to be garbage collected even though it is passed into
create_volume()

An example error can be found in the c-vol logs here:

http://dcf901611175aa43f968-c54047c910227e27e1d6f03bb1796fd7.r95.cf5.rackcdn.com/57/181157/1/check/cinder-cDOT-NFS/0473c54/ 




One way to get around whatever the issue is would be to change the
drivers to not update the object directly as it is not needed. But
this should not fail. Perhaps a more proper fix is for the volume
manager to not pass around sqlalchemy objects.


+1



Something changed in taskflow, however, and we should just
understand if that has other impact.


I'd like to understand that also: the only one commit that touched this
stuff is https://github.com/openstack/taskflow/commit/227cf52 (which
basically ensured that a storage object copy is modified, then saved,
then the local object is updated vs updating the local object, and then
saving, which has problems/inconsistencies if the save fails).



-Alex


__ 



OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__ 


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__ 


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


--
Warm Regards,
Bharat Kumar Kobagana
Software Engineer
OpenStack Storage – RedHat India
Mobile - +91 9949278005


Re: [openstack-dev] [cinder] Taskflow 0.10.0 incompatible with NetApp NFS drivers

2015-05-08 Thread Kerr, Andrew
The problem is in the version of taskflow that is downloaded from pypi by 
devstack.  You will need to wait until a new version 0.10.0 is available [1]

[1] https://pypi.python.org/pypi/taskflow/

Andrew Kerr
OpenStack QA
Cloud Solutions Group
NetApp

From: Bharat Kumar 
bharat.kobag...@redhat.commailto:bharat.kobag...@redhat.com
Reply-To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Date: Friday, May 8, 2015 at 7:37 AM
To: 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [cinder] Taskflow 0.10.0 incompatible with NetApp 
NFS drivers

GlusterFS CI job is still failing with the same issue.

I gave couple of rechecks on [1], after 
https://review.openstack.org/#/c/181288/ patch got merged.

But still GlusterFS CI job is failing with below error [2]:
ObjectDereferencedError: Can't emit change event for attribute 
'Volume.provider_location' - parent object of type Volume has been garbage 
collected.

Also I found the same behaviour with NetApp CI also.


[1] https://review.openstack.org/#/c/165424/
[2] 
http://logs.openstack.org/24/165424/6/check/check-tempest-dsvm-full-glusterfs-nv/f386477/logs/screen-c-vol.txt.gz


On 05/08/2015 10:21 AM, Joshua Harlow wrote:
Alright, it was as I had a hunch for, a small bug found in the new algorithm to 
make the storage layer copy-original,mutate-copy,save-copy,update-original (vs 
update-original,save-original) more reliable.

https://bugs.launchpad.net/taskflow/+bug/1452978 opened and a one line fix made 
@ https://review.openstack.org/#/c/181288/ to stop trying to copy task results 
(which was activating logic that must of caused the reference to drop out of 
existence and therefore the issue noted below).

Will get that released in 0.10.1 once it flushes through the pipeline.

Thanks alex for helping double check, if others want to check to that'd be 
nice, can make sure that's the root cause (overzealous usage of copy.copy, ha).

Overall I'd still *highly* recommend that the following still happen:

 One way to get around whatever the issue is would be to change the
 drivers to not update the object directly as it is not needed. But
 this should not fail. Perhaps a more proper fix is for the volume
 manager to not pass around sqlalchemy objects.

But that can be a later tweak that cinder does; using any taskflow engine that 
isn't the greenthreaded/threaded/serial engine will require results to be 
serializable, and therefore copyable, so that those results can go across IPC 
or MQ/other boundaries. Sqlalchemy objects won't fit either of these cases 
(obviously).

-Josh

Joshua Harlow wrote:
Are we sure this is taskflow? I'm wondering since those errors are more
from task code (which is in cinder) and the following seems to be a
general garbage collection issue (not connected to taskflow?):

'Exception during message handling: Can't emit change event for
attribute 'Volume.provider_location' - parent object of type Volume
has been garbage collected.'''

Or:

'''2015-05-07 22:42:51.142 17040 TRACE oslo_messaging.rpc.dispatcher
ObjectDereferencedError: Can't emit change event for attribute
'Volume.provider_location' - parent object of type Volume has been
garbage collected.'''

Alex Meade wrote:
So it seems that this will break a number of drivers, I see that
glusterfs does the same thing.

On Thu, May 7, 2015 at 10:29 PM, Alex Meade 
mr.alex.me...@gmail.commailto:mr.alex.me...@gmail.com
mailto:mr.alex.me...@gmail.commailto:mr.alex.me...@gmail.com wrote:

It appears that the release of taskflow 0.10.0 exposed an issue in
the NetApp NFS drivers. Something changed that caused the sqlalchemy
Volume object to be garbage collected even though it is passed into
create_volume()

An example error can be found in the c-vol logs here:

http://dcf901611175aa43f968-c54047c910227e27e1d6f03bb1796fd7.r95.cf5.rackcdn.com/57/181157/1/check/cinder-cDOT-NFS/0473c54/


One way to get around whatever the issue is would be to change the
drivers to not update the object directly as it is not needed. But
this should not fail. Perhaps a more proper fix is for the volume
manager to not pass around sqlalchemy objects.

+1


Something changed in taskflow, however, and we should just
understand if that has other impact.

I'd like to understand that also: the only one commit that touched this
stuff is https://github.com/openstack/taskflow/commit/227cf52 (which
basically ensured that a storage object copy is modified, then saved,
then the local object is updated vs updating the local object, and then
saving, which has problems/inconsistencies if the save fails).


-Alex


__

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org

Re: [openstack-dev] [cinder] Taskflow 0.10.0 incompatible with NetApp NFS drivers

2015-05-07 Thread Alex Meade
So it seems that this will break a number of drivers, I see that glusterfs
does the same thing.

On Thu, May 7, 2015 at 10:29 PM, Alex Meade mr.alex.me...@gmail.com wrote:

 It appears that the release of taskflow 0.10.0 exposed an issue in the
 NetApp NFS drivers. Something changed that caused the sqlalchemy Volume
 object to be garbage collected even though it is passed into create_volume()

 An example error can be found in the c-vol logs here:


 http://dcf901611175aa43f968-c54047c910227e27e1d6f03bb1796fd7.r95.cf5.rackcdn.com/57/181157/1/check/cinder-cDOT-NFS/0473c54/

 One way to get around whatever the issue is would be to change the drivers
 to not update the object directly as it is not needed. But this should not
 fail. Perhaps a more proper fix is for the volume manager to not pass
 around sqlalchemy objects.

 Something changed in taskflow, however, and we should just understand if
 that has other impact.

 -Alex

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] Taskflow 0.10.0 incompatible with NetApp NFS drivers

2015-05-07 Thread Joshua Harlow
Alright, it was as I had a hunch for, a small bug found in the new 
algorithm to make the storage layer 
copy-original,mutate-copy,save-copy,update-original (vs 
update-original,save-original) more reliable.


https://bugs.launchpad.net/taskflow/+bug/1452978 opened and a one line 
fix made @ https://review.openstack.org/#/c/181288/ to stop trying to 
copy task results (which was activating logic that must of caused the 
reference to drop out of existence and therefore the issue noted below).


Will get that released in 0.10.1 once it flushes through the pipeline.

Thanks alex for helping double check, if others want to check to that'd 
be nice, can make sure that's the root cause (overzealous usage of 
copy.copy, ha).


Overall I'd still *highly* recommend that the following still happen:

 One way to get around whatever the issue is would be to change the
 drivers to not update the object directly as it is not needed. But
 this should not fail. Perhaps a more proper fix is for the volume
 manager to not pass around sqlalchemy objects.

But that can be a later tweak that cinder does; using any taskflow 
engine that isn't the greenthreaded/threaded/serial engine will require 
results to be serializable, and therefore copyable, so that those 
results can go across IPC or MQ/other boundaries. Sqlalchemy objects 
won't fit either of these cases (obviously).


-Josh

Joshua Harlow wrote:

Are we sure this is taskflow? I'm wondering since those errors are more
from task code (which is in cinder) and the following seems to be a
general garbage collection issue (not connected to taskflow?):

'Exception during message handling: Can't emit change event for
attribute 'Volume.provider_location' - parent object of type Volume
has been garbage collected.'''

Or:

'''2015-05-07 22:42:51.142 17040 TRACE oslo_messaging.rpc.dispatcher
ObjectDereferencedError: Can't emit change event for attribute
'Volume.provider_location' - parent object of type Volume has been
garbage collected.'''

Alex Meade wrote:

So it seems that this will break a number of drivers, I see that
glusterfs does the same thing.

On Thu, May 7, 2015 at 10:29 PM, Alex Meade mr.alex.me...@gmail.com
mailto:mr.alex.me...@gmail.com wrote:

It appears that the release of taskflow 0.10.0 exposed an issue in
the NetApp NFS drivers. Something changed that caused the sqlalchemy
Volume object to be garbage collected even though it is passed into
create_volume()

An example error can be found in the c-vol logs here:

http://dcf901611175aa43f968-c54047c910227e27e1d6f03bb1796fd7.r95.cf5.rackcdn.com/57/181157/1/check/cinder-cDOT-NFS/0473c54/


One way to get around whatever the issue is would be to change the
drivers to not update the object directly as it is not needed. But
this should not fail. Perhaps a more proper fix is for the volume
manager to not pass around sqlalchemy objects.


+1



Something changed in taskflow, however, and we should just
understand if that has other impact.


I'd like to understand that also: the only one commit that touched this
stuff is https://github.com/openstack/taskflow/commit/227cf52 (which
basically ensured that a storage object copy is modified, then saved,
then the local object is updated vs updating the local object, and then
saving, which has problems/inconsistencies if the save fails).



-Alex


__

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev