Re: [openstack-dev] [nova] theoretical race between live migration and resource audit?

2016-06-14 Thread Chris Friesen
Under normal circumstances a bit of resource tracking error is generally okay. 
However in the case of CPU pinning it's a major problem because it's not caught 
at instance boot time, so you end up with two instances that both think they 
have exclusive access to one or more host CPUs.


If we get into this scenario it ends up raising a CPUPinningInvalid exception 
during the resource audit, which causes the audit to be aborted.


Chris

On 06/10/2016 02:36 AM, Matthew Booth wrote:

Yes, this is a race.

However, it's my understanding that this is 'ok'. The resource tracker doesn't
claim to be 100% accurate at all times, right? Otherwise why would it update
itself in a period task in the first place. It's my understanding that the
resource tracker is basically a best effort cache, and that scheduling decisions
can still fail at the host. The resource tracker will fix itself next time it
runs via its periodic task.

Matt (not a scheduler person)

On Thu, Jun 9, 2016 at 10:41 PM, Chris Friesen > wrote:

Hi,

I'm wondering if we might have a race between live migration and the
resource audit.  I've included a few people on the receiver list that have
worked directly with this code in the past.

In _update_available_resource() we have code that looks like this:

instances = objects.InstanceList.get_by_host_and_node()
self._update_usage_from_instances()
migrations = objects.MigrationList.get_in_progress_by_host_and_node()
self._update_usage_from_migrations()


In post_live_migration_at_destination() we do this (updating the host and
node as well as the task state):
 instance.host = self.host
 instance.task_state = None
 instance.node = node_name
 instance.save(expected_task_state=task_states.MIGRATING)


And in _post_live_migration() we update the migration status to "completed":
 if migrate_data and migrate_data.get('migration'):
 migrate_data['migration'].status = 'completed'
 migrate_data['migration'].save()


Both of the latter routines are not serialized by the
COMPUTE_RESOURCE_SEMAPHORE, so they can race relative to the code in
_update_available_resource().


I'm wondering if we can have a situation like this:

1) migration in progress
2) We start running _update_available_resource() on destination, and we call
instances = objects.InstanceList.get_by_host_and_node().  This will not
return the migration, because it is not yet on the destination host.
3) The migration completes and we call post_live_migration_at_destination(),
which sets the host/node/task_state on the instance.
4) In _update_available_resource() on destination, we call migrations =
objects.MigrationList.get_in_progress_by_host_and_node().  This will return
the migration for the instance in question, but when we run
self._update_usage_from_migrations() the uuid will not be in "instances" and
so we will use the instance from the newly-queried migration.  We will then
ignore the instance because it is not in a "migrating" state.

Am I imagining things, or is there a race here?  If so, the negative effects
would be that the resources of the migrating instance would be "lost",
allowing a newly-scheduled instance to claim the same resources (PCI
devices, pinned CPUs, etc.)

Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] theoretical race between live migration and resource audit?

2016-06-14 Thread Murray, Paul (HP Cloud)
Hi Chris, you are right – and I think there are several synchronization issues 
in the code there.

The migration process should be synchronized as it is in build_run_instance 
etc. At the moment if pre_live_migrate fails due to an RPC timeout (i.e. it 
takes too long for some reason) the rollback can be initiated while it is still 
running. So you can get volume attachments and vif plugging being built and 
torn down at the same time, not good. [I’m not sure if that has a bug report – 
I’ll file one if not].

We need to pay attention to these cases as we refactor the live migration 
process. This has begun with:
https://specs.openstack.org/openstack/nova-specs/specs/newton/approved/async-live-migration-rest-check.html
https://specs.openstack.org/openstack/nova-specs/specs/newton/approved/live_migration_compute_communcation.html

Paul

From: lương hữu tuấn [mailto:tuantulu...@gmail.com]
Sent: 10 June 2016 10:20
To: OpenStack Development Mailing List (not for usage questions) 
<openstack-dev@lists.openstack.org>
Subject: Re: [openstack-dev] [nova] theoretical race between live migration and 
resource audit?

Hi,

Yes, it is actually a race and we have already faced a negative effect when 
using evacuation. Some information of cpu pinning is lost. Imagine that, in 
some cases, we do some re-scheduling actions (evacuate, live-migration, etc.) 
then immediately do the next actions (delete, resize, etc.) before the 
resource_tracker updates in the next period. In this case, it fails. Actually, 
it has some negative side in writing tests based on the scenario described 
above.

Br,

Tutj

On Fri, Jun 10, 2016 at 10:36 AM, Matthew Booth 
<mbo...@redhat.com<mailto:mbo...@redhat.com>> wrote:
Yes, this is a race.

However, it's my understanding that this is 'ok'. The resource tracker doesn't 
claim to be 100% accurate at all times, right? Otherwise why would it update 
itself in a period task in the first place. It's my understanding that the 
resource tracker is basically a best effort cache, and that scheduling 
decisions can still fail at the host. The resource tracker will fix itself next 
time it runs via its periodic task.

Matt (not a scheduler person)

On Thu, Jun 9, 2016 at 10:41 PM, Chris Friesen 
<chris.frie...@windriver.com<mailto:chris.frie...@windriver.com>> wrote:
Hi,

I'm wondering if we might have a race between live migration and the resource 
audit.  I've included a few people on the receiver list that have worked 
directly with this code in the past.

In _update_available_resource() we have code that looks like this:

instances = objects.InstanceList.get_by_host_and_node()
self._update_usage_from_instances()
migrations = objects.MigrationList.get_in_progress_by_host_and_node()
self._update_usage_from_migrations()


In post_live_migration_at_destination() we do this (updating the host and node 
as well as the task state):
instance.host = self.host
instance.task_state = None
instance.node = node_name
instance.save(expected_task_state=task_states.MIGRATING)


And in _post_live_migration() we update the migration status to "completed":
if migrate_data and migrate_data.get('migration'):
migrate_data['migration'].status = 'completed'
migrate_data['migration'].save()


Both of the latter routines are not serialized by the 
COMPUTE_RESOURCE_SEMAPHORE, so they can race relative to the code in 
_update_available_resource().


I'm wondering if we can have a situation like this:

1) migration in progress
2) We start running _update_available_resource() on destination, and we call 
instances = objects.InstanceList.get_by_host_and_node().  This will not return 
the migration, because it is not yet on the destination host.
3) The migration completes and we call post_live_migration_at_destination(), 
which sets the host/node/task_state on the instance.
4) In _update_available_resource() on destination, we call migrations = 
objects.MigrationList.get_in_progress_by_host_and_node().  This will return the 
migration for the instance in question, but when we run 
self._update_usage_from_migrations() the uuid will not be in "instances" and so 
we will use the instance from the newly-queried migration.  We will then ignore 
the instance because it is not in a "migrating" state.

Am I imagining things, or is there a race here?  If so, the negative effects 
would be that the resources of the migrating instance would be "lost", allowing 
a newly-scheduled instance to claim the same resources (PCI devices, pinned 
CPUs, etc.)

Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/opens

Re: [openstack-dev] [nova] theoretical race between live migration and resource audit?

2016-06-10 Thread lương hữu tuấn
Hi,

Yes, it is actually a race and we have already faced a negative effect when
using evacuation. Some information of cpu pinning is lost. Imagine that, in
some cases, we do some re-scheduling actions (evacuate, live-migration,
etc.) then immediately do the next actions (delete, resize, etc.) before
the resource_tracker updates in the next period. In this case, it fails.
Actually, it has some negative side in writing tests based on the scenario
described above.

Br,

Tutj

On Fri, Jun 10, 2016 at 10:36 AM, Matthew Booth  wrote:

> Yes, this is a race.
>
> However, it's my understanding that this is 'ok'. The resource tracker
> doesn't claim to be 100% accurate at all times, right? Otherwise why would
> it update itself in a period task in the first place. It's my understanding
> that the resource tracker is basically a best effort cache, and that
> scheduling decisions can still fail at the host. The resource tracker will
> fix itself next time it runs via its periodic task.
>
> Matt (not a scheduler person)
>
> On Thu, Jun 9, 2016 at 10:41 PM, Chris Friesen <
> chris.frie...@windriver.com> wrote:
>
>> Hi,
>>
>> I'm wondering if we might have a race between live migration and the
>> resource audit.  I've included a few people on the receiver list that have
>> worked directly with this code in the past.
>>
>> In _update_available_resource() we have code that looks like this:
>>
>> instances = objects.InstanceList.get_by_host_and_node()
>> self._update_usage_from_instances()
>> migrations = objects.MigrationList.get_in_progress_by_host_and_node()
>> self._update_usage_from_migrations()
>>
>>
>> In post_live_migration_at_destination() we do this (updating the host and
>> node as well as the task state):
>> instance.host = self.host
>> instance.task_state = None
>> instance.node = node_name
>> instance.save(expected_task_state=task_states.MIGRATING)
>>
>>
>> And in _post_live_migration() we update the migration status to
>> "completed":
>> if migrate_data and migrate_data.get('migration'):
>> migrate_data['migration'].status = 'completed'
>> migrate_data['migration'].save()
>>
>>
>> Both of the latter routines are not serialized by the
>> COMPUTE_RESOURCE_SEMAPHORE, so they can race relative to the code in
>> _update_available_resource().
>>
>>
>> I'm wondering if we can have a situation like this:
>>
>> 1) migration in progress
>> 2) We start running _update_available_resource() on destination, and we
>> call instances = objects.InstanceList.get_by_host_and_node().  This will
>> not return the migration, because it is not yet on the destination host.
>> 3) The migration completes and we call
>> post_live_migration_at_destination(), which sets the host/node/task_state
>> on the instance.
>> 4) In _update_available_resource() on destination, we call migrations =
>> objects.MigrationList.get_in_progress_by_host_and_node().  This will return
>> the migration for the instance in question, but when we run
>> self._update_usage_from_migrations() the uuid will not be in "instances"
>> and so we will use the instance from the newly-queried migration.  We will
>> then ignore the instance because it is not in a "migrating" state.
>>
>> Am I imagining things, or is there a race here?  If so, the negative
>> effects would be that the resources of the migrating instance would be
>> "lost", allowing a newly-scheduled instance to claim the same resources
>> (PCI devices, pinned CPUs, etc.)
>>
>> Chris
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
>
> --
> Matthew Booth
> Red Hat Engineering, Virtualisation Team
>
> Phone: +442070094448 (UK)
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] theoretical race between live migration and resource audit?

2016-06-10 Thread Matthew Booth
Yes, this is a race.

However, it's my understanding that this is 'ok'. The resource tracker
doesn't claim to be 100% accurate at all times, right? Otherwise why would
it update itself in a period task in the first place. It's my understanding
that the resource tracker is basically a best effort cache, and that
scheduling decisions can still fail at the host. The resource tracker will
fix itself next time it runs via its periodic task.

Matt (not a scheduler person)

On Thu, Jun 9, 2016 at 10:41 PM, Chris Friesen 
wrote:

> Hi,
>
> I'm wondering if we might have a race between live migration and the
> resource audit.  I've included a few people on the receiver list that have
> worked directly with this code in the past.
>
> In _update_available_resource() we have code that looks like this:
>
> instances = objects.InstanceList.get_by_host_and_node()
> self._update_usage_from_instances()
> migrations = objects.MigrationList.get_in_progress_by_host_and_node()
> self._update_usage_from_migrations()
>
>
> In post_live_migration_at_destination() we do this (updating the host and
> node as well as the task state):
> instance.host = self.host
> instance.task_state = None
> instance.node = node_name
> instance.save(expected_task_state=task_states.MIGRATING)
>
>
> And in _post_live_migration() we update the migration status to
> "completed":
> if migrate_data and migrate_data.get('migration'):
> migrate_data['migration'].status = 'completed'
> migrate_data['migration'].save()
>
>
> Both of the latter routines are not serialized by the
> COMPUTE_RESOURCE_SEMAPHORE, so they can race relative to the code in
> _update_available_resource().
>
>
> I'm wondering if we can have a situation like this:
>
> 1) migration in progress
> 2) We start running _update_available_resource() on destination, and we
> call instances = objects.InstanceList.get_by_host_and_node().  This will
> not return the migration, because it is not yet on the destination host.
> 3) The migration completes and we call
> post_live_migration_at_destination(), which sets the host/node/task_state
> on the instance.
> 4) In _update_available_resource() on destination, we call migrations =
> objects.MigrationList.get_in_progress_by_host_and_node().  This will return
> the migration for the instance in question, but when we run
> self._update_usage_from_migrations() the uuid will not be in "instances"
> and so we will use the instance from the newly-queried migration.  We will
> then ignore the instance because it is not in a "migrating" state.
>
> Am I imagining things, or is there a race here?  If so, the negative
> effects would be that the resources of the migrating instance would be
> "lost", allowing a newly-scheduled instance to claim the same resources
> (PCI devices, pinned CPUs, etc.)
>
> Chris
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] theoretical race between live migration and resource audit?

2016-06-09 Thread Chris Friesen

Hi,

I'm wondering if we might have a race between live migration and the resource 
audit.  I've included a few people on the receiver list that have worked 
directly with this code in the past.


In _update_available_resource() we have code that looks like this:

instances = objects.InstanceList.get_by_host_and_node()
self._update_usage_from_instances()
migrations = objects.MigrationList.get_in_progress_by_host_and_node()
self._update_usage_from_migrations()


In post_live_migration_at_destination() we do this (updating the host and node 
as well as the task state):

instance.host = self.host
instance.task_state = None
instance.node = node_name
instance.save(expected_task_state=task_states.MIGRATING)


And in _post_live_migration() we update the migration status to "completed":
if migrate_data and migrate_data.get('migration'):
migrate_data['migration'].status = 'completed'
migrate_data['migration'].save()


Both of the latter routines are not serialized by the 
COMPUTE_RESOURCE_SEMAPHORE, so they can race relative to the code in 
_update_available_resource().



I'm wondering if we can have a situation like this:

1) migration in progress
2) We start running _update_available_resource() on destination, and we call 
instances = objects.InstanceList.get_by_host_and_node().  This will not return 
the migration, because it is not yet on the destination host.
3) The migration completes and we call post_live_migration_at_destination(), 
which sets the host/node/task_state on the instance.
4) In _update_available_resource() on destination, we call migrations = 
objects.MigrationList.get_in_progress_by_host_and_node().  This will return the 
migration for the instance in question, but when we run 
self._update_usage_from_migrations() the uuid will not be in "instances" and so 
we will use the instance from the newly-queried migration.  We will then ignore 
the instance because it is not in a "migrating" state.


Am I imagining things, or is there a race here?  If so, the negative effects 
would be that the resources of the migrating instance would be "lost", allowing 
a newly-scheduled instance to claim the same resources (PCI devices, pinned 
CPUs, etc.)


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev