Re: [openstack-dev] [cinder] Taskflow 0.10.0 incompatible with NetApp NFS drivers
Will release that when I get in to work this morning, Needed to sleep ;) -Josh Kerr, Andrew wrote: The problem is in the version of taskflow that is downloaded from pypi by devstack. You will need to wait until a new version 0.10.0 is available [1] [1] https://pypi.python.org/pypi/taskflow/ Andrew Kerr OpenStack QA Cloud Solutions Group NetApp From: Bharat Kumar bharat.kobag...@redhat.com mailto:bharat.kobag...@redhat.com Reply-To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org mailto:openstack-dev@lists.openstack.org Date: Friday, May 8, 2015 at 7:37 AM To: openstack-dev@lists.openstack.org mailto:openstack-dev@lists.openstack.org openstack-dev@lists.openstack.org mailto:openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [cinder] Taskflow 0.10.0 incompatible with NetApp NFS drivers GlusterFS CI job is still failing with the same issue. I gave couple of rechecks on [1], after https://review.openstack.org/#/c/181288/ patch got merged. But still GlusterFS CI job is failing with below error [2]: ObjectDereferencedError: Can't emit change event for attribute 'Volume.provider_location' - parent object of type Volume has been garbage collected. Also I found the same behaviour with NetApp CI also. [1] https://review.openstack.org/#/c/165424/ [2] http://logs.openstack.org/24/165424/6/check/check-tempest-dsvm-full-glusterfs-nv/f386477/logs/screen-c-vol.txt.gz On 05/08/2015 10:21 AM, Joshua Harlow wrote: Alright, it was as I had a hunch for, a small bug found in the new algorithm to make the storage layer copy-original,mutate-copy,save-copy,update-original (vs update-original,save-original) more reliable. https://bugs.launchpad.net/taskflow/+bug/1452978 opened and a one line fix made @ https://review.openstack.org/#/c/181288/ to stop trying to copy task results (which was activating logic that must of caused the reference to drop out of existence and therefore the issue noted below). Will get that released in 0.10.1 once it flushes through the pipeline. Thanks alex for helping double check, if others want to check to that'd be nice, can make sure that's the root cause (overzealous usage of copy.copy, ha). Overall I'd still *highly* recommend that the following still happen: One way to get around whatever the issue is would be to change the drivers to not update the object directly as it is not needed. But this should not fail. Perhaps a more proper fix is for the volume manager to not pass around sqlalchemy objects. But that can be a later tweak that cinder does; using any taskflow engine that isn't the greenthreaded/threaded/serial engine will require results to be serializable, and therefore copyable, so that those results can go across IPC or MQ/other boundaries. Sqlalchemy objects won't fit either of these cases (obviously). -Josh Joshua Harlow wrote: Are we sure this is taskflow? I'm wondering since those errors are more from task code (which is in cinder) and the following seems to be a general garbage collection issue (not connected to taskflow?): 'Exception during message handling: Can't emit change event for attribute 'Volume.provider_location' - parent object of type Volume has been garbage collected.''' Or: '''2015-05-07 22:42:51.142 17040 TRACE oslo_messaging.rpc.dispatcher ObjectDereferencedError: Can't emit change event for attribute 'Volume.provider_location' - parent object of type Volume has been garbage collected.''' Alex Meade wrote: So it seems that this will break a number of drivers, I see that glusterfs does the same thing. On Thu, May 7, 2015 at 10:29 PM, Alex Meade mr.alex.me...@gmail.com mailto:mr.alex.me...@gmail.com wrote: It appears that the release of taskflow 0.10.0 exposed an issue in the NetApp NFS drivers. Something changed that caused the sqlalchemy Volume object to be garbage collected even though it is passed into create_volume() An example error can be found in the c-vol logs here: http://dcf901611175aa43f968-c54047c910227e27e1d6f03bb1796fd7.r95.cf5.rackcdn.com/57/181157/1/check/cinder-cDOT-NFS/0473c54/ One way to get around whatever the issue is would be to change the drivers to not update the object directly as it is not needed. But this should not fail. Perhaps a more proper fix is for the volume manager to not pass around sqlalchemy objects. +1 Something changed in taskflow, however, and we should just understand if that has other impact. I'd like to understand that also: the only one commit that touched this stuff is https://github.com/openstack/taskflow/commit/227cf52 (which basically ensured that a storage object copy is modified, then saved, then the local object is updated vs updating the local object, and then saving, which has problems/inconsistencies if the save fails). -Alex __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ
Re: [openstack-dev] [cinder] Taskflow 0.10.0 incompatible with NetApp NFS drivers
GlusterFS CI job is still failing with the same issue. I gave couple of rechecks on [1], after https://review.openstack.org/#/c/181288/ patch got merged. But still GlusterFS CI job is failing with below error [2]: ObjectDereferencedError: Can't emit change event for attribute 'Volume.provider_location' - parent object of type Volume has been garbage collected. Also I found the same behaviour with NetApp CI also. [1] https://review.openstack.org/#/c/165424/ [2] http://logs.openstack.org/24/165424/6/check/check-tempest-dsvm-full-glusterfs-nv/f386477/logs/screen-c-vol.txt.gz On 05/08/2015 10:21 AM, Joshua Harlow wrote: Alright, it was as I had a hunch for, a small bug found in the new algorithm to make the storage layer copy-original,mutate-copy,save-copy,update-original (vs update-original,save-original) more reliable. https://bugs.launchpad.net/taskflow/+bug/1452978 opened and a one line fix made @ https://review.openstack.org/#/c/181288/ to stop trying to copy task results (which was activating logic that must of caused the reference to drop out of existence and therefore the issue noted below). Will get that released in 0.10.1 once it flushes through the pipeline. Thanks alex for helping double check, if others want to check to that'd be nice, can make sure that's the root cause (overzealous usage of copy.copy, ha). Overall I'd still *highly* recommend that the following still happen: One way to get around whatever the issue is would be to change the drivers to not update the object directly as it is not needed. But this should not fail. Perhaps a more proper fix is for the volume manager to not pass around sqlalchemy objects. But that can be a later tweak that cinder does; using any taskflow engine that isn't the greenthreaded/threaded/serial engine will require results to be serializable, and therefore copyable, so that those results can go across IPC or MQ/other boundaries. Sqlalchemy objects won't fit either of these cases (obviously). -Josh Joshua Harlow wrote: Are we sure this is taskflow? I'm wondering since those errors are more from task code (which is in cinder) and the following seems to be a general garbage collection issue (not connected to taskflow?): 'Exception during message handling: Can't emit change event for attribute 'Volume.provider_location' - parent object of type Volume has been garbage collected.''' Or: '''2015-05-07 22:42:51.142 17040 TRACE oslo_messaging.rpc.dispatcher ObjectDereferencedError: Can't emit change event for attribute 'Volume.provider_location' - parent object of type Volume has been garbage collected.''' Alex Meade wrote: So it seems that this will break a number of drivers, I see that glusterfs does the same thing. On Thu, May 7, 2015 at 10:29 PM, Alex Meade mr.alex.me...@gmail.com mailto:mr.alex.me...@gmail.com wrote: It appears that the release of taskflow 0.10.0 exposed an issue in the NetApp NFS drivers. Something changed that caused the sqlalchemy Volume object to be garbage collected even though it is passed into create_volume() An example error can be found in the c-vol logs here: http://dcf901611175aa43f968-c54047c910227e27e1d6f03bb1796fd7.r95.cf5.rackcdn.com/57/181157/1/check/cinder-cDOT-NFS/0473c54/ One way to get around whatever the issue is would be to change the drivers to not update the object directly as it is not needed. But this should not fail. Perhaps a more proper fix is for the volume manager to not pass around sqlalchemy objects. +1 Something changed in taskflow, however, and we should just understand if that has other impact. I'd like to understand that also: the only one commit that touched this stuff is https://github.com/openstack/taskflow/commit/227cf52 (which basically ensured that a storage object copy is modified, then saved, then the local object is updated vs updating the local object, and then saving, which has problems/inconsistencies if the save fails). -Alex __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Warm Regards, Bharat Kumar Kobagana Software Engineer OpenStack Storage – RedHat India Mobile - +91 9949278005
Re: [openstack-dev] [cinder] Taskflow 0.10.0 incompatible with NetApp NFS drivers
The problem is in the version of taskflow that is downloaded from pypi by devstack. You will need to wait until a new version 0.10.0 is available [1] [1] https://pypi.python.org/pypi/taskflow/ Andrew Kerr OpenStack QA Cloud Solutions Group NetApp From: Bharat Kumar bharat.kobag...@redhat.commailto:bharat.kobag...@redhat.com Reply-To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Date: Friday, May 8, 2015 at 7:37 AM To: openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [cinder] Taskflow 0.10.0 incompatible with NetApp NFS drivers GlusterFS CI job is still failing with the same issue. I gave couple of rechecks on [1], after https://review.openstack.org/#/c/181288/ patch got merged. But still GlusterFS CI job is failing with below error [2]: ObjectDereferencedError: Can't emit change event for attribute 'Volume.provider_location' - parent object of type Volume has been garbage collected. Also I found the same behaviour with NetApp CI also. [1] https://review.openstack.org/#/c/165424/ [2] http://logs.openstack.org/24/165424/6/check/check-tempest-dsvm-full-glusterfs-nv/f386477/logs/screen-c-vol.txt.gz On 05/08/2015 10:21 AM, Joshua Harlow wrote: Alright, it was as I had a hunch for, a small bug found in the new algorithm to make the storage layer copy-original,mutate-copy,save-copy,update-original (vs update-original,save-original) more reliable. https://bugs.launchpad.net/taskflow/+bug/1452978 opened and a one line fix made @ https://review.openstack.org/#/c/181288/ to stop trying to copy task results (which was activating logic that must of caused the reference to drop out of existence and therefore the issue noted below). Will get that released in 0.10.1 once it flushes through the pipeline. Thanks alex for helping double check, if others want to check to that'd be nice, can make sure that's the root cause (overzealous usage of copy.copy, ha). Overall I'd still *highly* recommend that the following still happen: One way to get around whatever the issue is would be to change the drivers to not update the object directly as it is not needed. But this should not fail. Perhaps a more proper fix is for the volume manager to not pass around sqlalchemy objects. But that can be a later tweak that cinder does; using any taskflow engine that isn't the greenthreaded/threaded/serial engine will require results to be serializable, and therefore copyable, so that those results can go across IPC or MQ/other boundaries. Sqlalchemy objects won't fit either of these cases (obviously). -Josh Joshua Harlow wrote: Are we sure this is taskflow? I'm wondering since those errors are more from task code (which is in cinder) and the following seems to be a general garbage collection issue (not connected to taskflow?): 'Exception during message handling: Can't emit change event for attribute 'Volume.provider_location' - parent object of type Volume has been garbage collected.''' Or: '''2015-05-07 22:42:51.142 17040 TRACE oslo_messaging.rpc.dispatcher ObjectDereferencedError: Can't emit change event for attribute 'Volume.provider_location' - parent object of type Volume has been garbage collected.''' Alex Meade wrote: So it seems that this will break a number of drivers, I see that glusterfs does the same thing. On Thu, May 7, 2015 at 10:29 PM, Alex Meade mr.alex.me...@gmail.commailto:mr.alex.me...@gmail.com mailto:mr.alex.me...@gmail.commailto:mr.alex.me...@gmail.com wrote: It appears that the release of taskflow 0.10.0 exposed an issue in the NetApp NFS drivers. Something changed that caused the sqlalchemy Volume object to be garbage collected even though it is passed into create_volume() An example error can be found in the c-vol logs here: http://dcf901611175aa43f968-c54047c910227e27e1d6f03bb1796fd7.r95.cf5.rackcdn.com/57/181157/1/check/cinder-cDOT-NFS/0473c54/ One way to get around whatever the issue is would be to change the drivers to not update the object directly as it is not needed. But this should not fail. Perhaps a more proper fix is for the volume manager to not pass around sqlalchemy objects. +1 Something changed in taskflow, however, and we should just understand if that has other impact. I'd like to understand that also: the only one commit that touched this stuff is https://github.com/openstack/taskflow/commit/227cf52 (which basically ensured that a storage object copy is modified, then saved, then the local object is updated vs updating the local object, and then saving, which has problems/inconsistencies if the save fails). -Alex __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org
Re: [openstack-dev] [cinder] Taskflow 0.10.0 incompatible with NetApp NFS drivers
So it seems that this will break a number of drivers, I see that glusterfs does the same thing. On Thu, May 7, 2015 at 10:29 PM, Alex Meade mr.alex.me...@gmail.com wrote: It appears that the release of taskflow 0.10.0 exposed an issue in the NetApp NFS drivers. Something changed that caused the sqlalchemy Volume object to be garbage collected even though it is passed into create_volume() An example error can be found in the c-vol logs here: http://dcf901611175aa43f968-c54047c910227e27e1d6f03bb1796fd7.r95.cf5.rackcdn.com/57/181157/1/check/cinder-cDOT-NFS/0473c54/ One way to get around whatever the issue is would be to change the drivers to not update the object directly as it is not needed. But this should not fail. Perhaps a more proper fix is for the volume manager to not pass around sqlalchemy objects. Something changed in taskflow, however, and we should just understand if that has other impact. -Alex __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder] Taskflow 0.10.0 incompatible with NetApp NFS drivers
Alright, it was as I had a hunch for, a small bug found in the new algorithm to make the storage layer copy-original,mutate-copy,save-copy,update-original (vs update-original,save-original) more reliable. https://bugs.launchpad.net/taskflow/+bug/1452978 opened and a one line fix made @ https://review.openstack.org/#/c/181288/ to stop trying to copy task results (which was activating logic that must of caused the reference to drop out of existence and therefore the issue noted below). Will get that released in 0.10.1 once it flushes through the pipeline. Thanks alex for helping double check, if others want to check to that'd be nice, can make sure that's the root cause (overzealous usage of copy.copy, ha). Overall I'd still *highly* recommend that the following still happen: One way to get around whatever the issue is would be to change the drivers to not update the object directly as it is not needed. But this should not fail. Perhaps a more proper fix is for the volume manager to not pass around sqlalchemy objects. But that can be a later tweak that cinder does; using any taskflow engine that isn't the greenthreaded/threaded/serial engine will require results to be serializable, and therefore copyable, so that those results can go across IPC or MQ/other boundaries. Sqlalchemy objects won't fit either of these cases (obviously). -Josh Joshua Harlow wrote: Are we sure this is taskflow? I'm wondering since those errors are more from task code (which is in cinder) and the following seems to be a general garbage collection issue (not connected to taskflow?): 'Exception during message handling: Can't emit change event for attribute 'Volume.provider_location' - parent object of type Volume has been garbage collected.''' Or: '''2015-05-07 22:42:51.142 17040 TRACE oslo_messaging.rpc.dispatcher ObjectDereferencedError: Can't emit change event for attribute 'Volume.provider_location' - parent object of type Volume has been garbage collected.''' Alex Meade wrote: So it seems that this will break a number of drivers, I see that glusterfs does the same thing. On Thu, May 7, 2015 at 10:29 PM, Alex Meade mr.alex.me...@gmail.com mailto:mr.alex.me...@gmail.com wrote: It appears that the release of taskflow 0.10.0 exposed an issue in the NetApp NFS drivers. Something changed that caused the sqlalchemy Volume object to be garbage collected even though it is passed into create_volume() An example error can be found in the c-vol logs here: http://dcf901611175aa43f968-c54047c910227e27e1d6f03bb1796fd7.r95.cf5.rackcdn.com/57/181157/1/check/cinder-cDOT-NFS/0473c54/ One way to get around whatever the issue is would be to change the drivers to not update the object directly as it is not needed. But this should not fail. Perhaps a more proper fix is for the volume manager to not pass around sqlalchemy objects. +1 Something changed in taskflow, however, and we should just understand if that has other impact. I'd like to understand that also: the only one commit that touched this stuff is https://github.com/openstack/taskflow/commit/227cf52 (which basically ensured that a storage object copy is modified, then saved, then the local object is updated vs updating the local object, and then saving, which has problems/inconsistencies if the save fails). -Alex __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev