Re: [openstack-dev] [nova] Metadata API cross joining "instance_metadata" and "instance_system_metadata"

2018-10-23 Thread Sergio A. de Carvalho Jr.
Make sense, Dan.

Thanks so much for your help.

Sergio

On Tue, Oct 23, 2018 at 5:01 PM Dan Smith  wrote:

> > I tested a code change that essentially reverts
> > https://review.openstack.org/#/c/276861/1/nova/api/metadata/base.py
> >
> > In other words, with this change metadata tables are not fetched by
> > default in API requests. If I understand correctly, metadata is
> > fetched in separate queries as the instance object is
> > created. Everything seems to work just fine, and I've considerably
> > reduced the amount of data fetched from the database, as well as
> > reduced the average response time of API requests.
> >
> > Given how simple it is and the results I'm getting, I don't see any
> > reason not to patch my clusters with this change.
> >
> > Do you guys see any other impact this change could have? Anything that
> > it could potentially break?
>
> This is probably fine as a bandage fix, but it's not the right one for
> upstream, IMHO. By doing what you did, you cause two RPC round-trips to
> fetch the instance and then the metadata every single time the metadata
> API is hit (not including the cache). By converting the DB load to do
> the two-step, we still hit the DB twice, but only one RPC round-trip,
> which will be much more efficient especially at load/scale.
>
> --Dan
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Metadata API cross joining "instance_metadata" and "instance_system_metadata"

2018-10-23 Thread Dan Smith
> I tested a code change that essentially reverts
> https://review.openstack.org/#/c/276861/1/nova/api/metadata/base.py
>
> In other words, with this change metadata tables are not fetched by
> default in API requests. If I understand correctly, metadata is
> fetched in separate queries as the instance object is
> created. Everything seems to work just fine, and I've considerably
> reduced the amount of data fetched from the database, as well as
> reduced the average response time of API requests.
>
> Given how simple it is and the results I'm getting, I don't see any
> reason not to patch my clusters with this change.
>
> Do you guys see any other impact this change could have? Anything that
> it could potentially break?

This is probably fine as a bandage fix, but it's not the right one for
upstream, IMHO. By doing what you did, you cause two RPC round-trips to
fetch the instance and then the metadata every single time the metadata
API is hit (not including the cache). By converting the DB load to do
the two-step, we still hit the DB twice, but only one RPC round-trip,
which will be much more efficient especially at load/scale.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Metadata API cross joining "instance_metadata" and "instance_system_metadata"

2018-10-23 Thread Sergio A. de Carvalho Jr.
I tested a code change that essentially reverts
https://review.openstack.org/#/c/276861/1/nova/api/metadata/base.py

In other words, with this change metadata tables are not fetched by default
in API requests. If I understand correctly, metadata is fetched in separate
queries as the instance object is created. Everything seems to work just
fine, and I've considerably reduced the amount of data fetched from the
database, as well as reduced the average response time of API requests.

Given how simple it is and the results I'm getting, I don't see any reason
not to patch my clusters with this change.

Do you guys see any other impact this change could have? Anything that it
could potentially break?


On Mon, Oct 22, 2018 at 10:05 PM Sergio A. de Carvalho Jr. <
scarvalh...@gmail.com> wrote:

>
> https://bugs.launchpad.net/nova/+bug/1799298
>
> On Mon, Oct 22, 2018 at 9:15 PM Sergio A. de Carvalho Jr. <
> scarvalh...@gmail.com> wrote:
>
>> Cool, I'll open a bug then.
>>
>> I was wondering if, before joining the metadata tables with the rest of
>> instance data, we could do a UNION, since both tables are structurally
>> identical.
>>
>> On Mon, Oct 22, 2018 at 9:04 PM Dan Smith  wrote:
>>
>>> > Do you guys see an easy fix here?
>>> >
>>> > Should I open a bug report?
>>>
>>> Definitely open a bug. IMHO, we should just make the single-instance
>>> load work like the multi ones, where we load the metadata separately if
>>> requested. We might be able to get away without sysmeta these days, but
>>> we needed it for the flavor details back when the join was added. But,
>>> user metadata is controllable by the user and definitely of interest in
>>> that code, so just dropping sysmeta from the explicit required_attrs
>>> isn't enough, IMHO.
>>>
>>> --Dan
>>>
>>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Metadata API cross joining "instance_metadata" and "instance_system_metadata"

2018-10-22 Thread Sergio A. de Carvalho Jr.
https://bugs.launchpad.net/nova/+bug/1799298

On Mon, Oct 22, 2018 at 9:15 PM Sergio A. de Carvalho Jr. <
scarvalh...@gmail.com> wrote:

> Cool, I'll open a bug then.
>
> I was wondering if, before joining the metadata tables with the rest of
> instance data, we could do a UNION, since both tables are structurally
> identical.
>
> On Mon, Oct 22, 2018 at 9:04 PM Dan Smith  wrote:
>
>> > Do you guys see an easy fix here?
>> >
>> > Should I open a bug report?
>>
>> Definitely open a bug. IMHO, we should just make the single-instance
>> load work like the multi ones, where we load the metadata separately if
>> requested. We might be able to get away without sysmeta these days, but
>> we needed it for the flavor details back when the join was added. But,
>> user metadata is controllable by the user and definitely of interest in
>> that code, so just dropping sysmeta from the explicit required_attrs
>> isn't enough, IMHO.
>>
>> --Dan
>>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Metadata API cross joining "instance_metadata" and "instance_system_metadata"

2018-10-22 Thread Sergio A. de Carvalho Jr.
Cool, I'll open a bug then.

I was wondering if, before joining the metadata tables with the rest of
instance data, we could do a UNION, since both tables are structurally
identical.

On Mon, Oct 22, 2018 at 9:04 PM Dan Smith  wrote:

> > Do you guys see an easy fix here?
> >
> > Should I open a bug report?
>
> Definitely open a bug. IMHO, we should just make the single-instance
> load work like the multi ones, where we load the metadata separately if
> requested. We might be able to get away without sysmeta these days, but
> we needed it for the flavor details back when the join was added. But,
> user metadata is controllable by the user and definitely of interest in
> that code, so just dropping sysmeta from the explicit required_attrs
> isn't enough, IMHO.
>
> --Dan
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Metadata API cross joining "instance_metadata" and "instance_system_metadata"

2018-10-22 Thread Dan Smith
> Do you guys see an easy fix here?
>
> Should I open a bug report?

Definitely open a bug. IMHO, we should just make the single-instance
load work like the multi ones, where we load the metadata separately if
requested. We might be able to get away without sysmeta these days, but
we needed it for the flavor details back when the join was added. But,
user metadata is controllable by the user and definitely of interest in
that code, so just dropping sysmeta from the explicit required_attrs
isn't enough, IMHO.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Metadata API cross joining "instance_metadata" and "instance_system_metadata"

2018-10-22 Thread Sergio A. de Carvalho Jr.
Thanks so much for the replies, guys.

> Have you debugged to the point of knowing where the initial DB query is
starting from?
>
> Looking at history, my guess is this is the change which introduced it
for all requests:
>
> https://review.openstack.org/#/c/276861/

That is my understanding too. Before, metadata was fetched separately but
this change meant that it both tables are always joined with other instance
data.

I've debugged it to the point where the query gets built, in

https://github.com/openstack/nova/blob/mitaka-eol/nova/db/sqlalchemy/api.py#L2005

which results in a number of left outer joins by the "joinedload" calls,
including to "instance_metadata" and "instance_system_metadata".

Most other tables have a single row for each instance (on our clusters,
anyway), so the impact for us is simply down to metadata.

> Just to be clear, you don't have any modifications to the code anywhere,
do you?

We do have some minor changes to Nova but nothing near instance, metadata
or the ORM code.

Do you guys see an easy fix here?

Should I open a bug report?

On Mon, Oct 22, 2018 at 7:17 PM Dan Smith  wrote:

> > We haven't been doing this (intentionally) for quite some time, as we
> > query and fill metadata linearly:
> >
> >
> https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L2244
> >
> > and have since 2013 (Havana):
> >
> > https://review.openstack.org/#/c/26136/
> >
> > So unless there has been a regression that is leaking those columns back
> > into the join list, I'm not sure why the query you show would be
> > generated.
>
> Ah, Matt Riedemann just pointed out on IRC that we're not doing it on
> single-instance fetch, which is what you'd be hitting in this path. We
> use that approach in a lot of places where the rows would also be
> multiplied by the number of instances, but not in the single case. So,
> that makes sense now.
>
> --Dan
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Metadata API cross joining "instance_metadata" and "instance_system_metadata"

2018-10-22 Thread Dan Smith
> We haven't been doing this (intentionally) for quite some time, as we
> query and fill metadata linearly:
>
> https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L2244
>
> and have since 2013 (Havana):
>
> https://review.openstack.org/#/c/26136/
>
> So unless there has been a regression that is leaking those columns back
> into the join list, I'm not sure why the query you show would be
> generated.

Ah, Matt Riedemann just pointed out on IRC that we're not doing it on
single-instance fetch, which is what you'd be hitting in this path. We
use that approach in a lot of places where the rows would also be
multiplied by the number of instances, but not in the single case. So,
that makes sense now.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Metadata API cross joining "instance_metadata" and "instance_system_metadata"

2018-10-22 Thread Dan Smith
> Of course this is only a problem when instances have a lot of metadata
> records. An instance with 50 records in "instance_metadata" and 50
> records in "instance_system_metadata" will fetch 50 x 50 = 2,500 rows
> from the database. It's not difficult to see how this can escalate
> quickly. This can be a particularly significant problem in a HA
> scenario with multiple API nodes pulling data from multiple database
> nodes.

We haven't been doing this (intentionally) for quite some time, as we
query and fill metadata linearly:

https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L2244

and have since 2013 (Havana):

https://review.openstack.org/#/c/26136/

So unless there has been a regression that is leaking those columns back
into the join list, I'm not sure why the query you show would be
generated.

Just to be clear, you don't have any modifications to the code anywhere,
do you?

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Metadata API cross joining "instance_metadata" and "instance_system_metadata"

2018-10-22 Thread Matt Riedemann

On 10/22/2018 11:59 AM, Matt Riedemann wrote:
Thanks for this. Have you debugged to the point of knowing where the 
initial DB query is starting from?


Looking at history, my guess is this is the change which introduced it 
for all requests:


https://review.openstack.org/#/c/276861/


From that change, as far as I can tell, we only needed to pre-join on 
metadata because of setting the "launch_metadata" variable:


https://review.openstack.org/#/c/276861/1/nova/api/metadata/base.py@145

I don't see anything directly using system_metadata, although that one 
is sometimes tricky and could be lazy-loaded elsewhere.


I do know that starting in ocata we use system_metadata for dynamic 
vendor metadata:


https://github.com/openstack/nova/blob/stable/ocata/nova/api/metadata/vendordata_dynamic.py#L85

Added in change: https://review.openstack.org/#/c/417780/

But if you don't provide vendor data then that should not be a problem.

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Metadata API cross joining "instance_metadata" and "instance_system_metadata"

2018-10-22 Thread Matt Riedemann

On 10/22/2018 11:25 AM, Sergio A. de Carvalho Jr. wrote:

Hi,

While troubleshooting a production issue we identified that the Nova 
metadata API is fetching a lot more raw data from the database than 
seems necessary. The problem appears to be caused by the SQL query used 
to fetch instance data that joins the "instance" table with, among 
others, two metadata tables: "instance_metadata" and 
"instance_system_metadata". Below is a simplified version of this query 
(I've added the full query at the end of this message for reference):


SELECT ...
   FROM (SELECT ...
           FROM `instances`
          WHERE `instances` . `deleted` = ?
            AND `instances` . `uuid` = ?
          LIMIT ?) AS `anon_1`
   LEFT OUTER JOIN `instance_system_metadata` AS 
`instance_system_metadata_1`
     ON `anon_1` . `instances_uuid` = `instance_system_metadata_1` . 
`instance_uuid`
   LEFT OUTER JOIN (`security_group_instance_association` AS 
`security_group_instance_association_1`

                    INNER JOIN `security_groups` AS `security_groups_1`
                    ON `security_groups_1` . `id` = 
`security_group_instance_association_1` . `security_group_id`
                    AND `security_group_instance_association_1` . 
`deleted` = ?

                    AND `security_groups_1` . `deleted` = ? )
     ON `security_group_instance_association_1` . `instance_uuid` = 
`anon_1` . `instances_uuid`

    AND `anon_1` . `instances_deleted` = ?
   LEFT OUTER JOIN `security_group_rules` AS `security_group_rules_1`
     ON `security_group_rules_1` . `parent_group_id` = 
`security_groups_1` . `id`

    AND `security_group_rules_1` . `deleted` = ?
   LEFT OUTER JOIN `instance_info_caches` AS `instance_info_caches_1`
     ON `instance_info_caches_1` . `instance_uuid` = `anon_1` . 
`instances_uuid`

   LEFT OUTER JOIN `instance_extra` AS `instance_extra_1`
     ON `instance_extra_1` . `instance_uuid` = `anon_1` . `instances_uuid`
   LEFT OUTER JOIN `instance_metadata` AS `instance_metadata_1`
     ON `instance_metadata_1` . `instance_uuid` = `anon_1` . 
`instances_uuid`

    AND `instance_metadata_1` . `deleted` = ?

The instance table has a 1-to-many relationship to both 
"instance_metadata" and "instance_system_metadata" tables, so the query 
is effectively producing a cross join of both metadata tables.


To illustrate the impact of this query, I have an instance that has 2 
records in "instance_metadata" and 5 records in "instance_system_metadata":


 > select instance_uuid,`key`,value from instance_metadata where 
instance_uuid = 'a6cf4a6a-effe-4438-9b7f-d61b23117b9b';

+--+---++
| instance_uuid                        | key       | value  |
+--+---++
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property1 | value1 |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property2 | value  |
+--+---++
2 rows in set (0.61 sec)

 > select instance_uuid,`key`,valusystem_metadata where instance_uuid = 
'a6cf4a6a-effe-4438-9b7f-d61b23117b9b';

++--+
| key                    | value                                |
++--+
| image_disk_format      | qcow2                                |
| image_min_ram          | 0                                    |
| image_min_disk         | 20                                   |
| image_base_image_ref   | 39cd564f-6a29-43e2-815b-62097968486a |
| image_container_format | bare                                 |
++--+
5 rows in set (0.00 sec)

For this particular instance, the query used by the metadata API will 
fetch 10 records from the database:


+--+-+---++--+
| anon_1_instances_uuid                | instance_metadata_1_key | 
instance_metadata_1_value | instance_system_metadata_1_key | 
instance_system_metadata_1_value     |

+--+-+---++--+
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property1               | 
value1                    | image_disk_format              | qcow2   
                          |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property2               | value 
                     | image_disk_format              | qcow2   
                      |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property1               | 
value1                    | image_min_ram                  | 0   
                          |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property2               | value 
                     | image_min_ram                  | 0   

[openstack-dev] [nova] Metadata API cross joining "instance_metadata" and "instance_system_metadata"

2018-10-22 Thread Sergio A. de Carvalho Jr.
Hi,

While troubleshooting a production issue we identified that the Nova
metadata API is fetching a lot more raw data from the database than seems
necessary. The problem appears to be caused by the SQL query used to fetch
instance data that joins the "instance" table with, among others, two
metadata tables: "instance_metadata" and "instance_system_metadata". Below
is a simplified version of this query (I've added the full query at the end
of this message for reference):

SELECT ...
  FROM (SELECT ...
  FROM `instances`
 WHERE `instances` . `deleted` = ?
   AND `instances` . `uuid` = ?
 LIMIT ?) AS `anon_1`
  LEFT OUTER JOIN `instance_system_metadata` AS `instance_system_metadata_1`
ON `anon_1` . `instances_uuid` = `instance_system_metadata_1` .
`instance_uuid`
  LEFT OUTER JOIN (`security_group_instance_association` AS
`security_group_instance_association_1`
   INNER JOIN `security_groups` AS `security_groups_1`
   ON `security_groups_1` . `id` =
`security_group_instance_association_1` . `security_group_id`
   AND `security_group_instance_association_1` . `deleted`
= ?
   AND `security_groups_1` . `deleted` = ? )
ON `security_group_instance_association_1` . `instance_uuid` = `anon_1`
. `instances_uuid`
   AND `anon_1` . `instances_deleted` = ?
  LEFT OUTER JOIN `security_group_rules` AS `security_group_rules_1`
ON `security_group_rules_1` . `parent_group_id` = `security_groups_1` .
`id`
   AND `security_group_rules_1` . `deleted` = ?
  LEFT OUTER JOIN `instance_info_caches` AS `instance_info_caches_1`
ON `instance_info_caches_1` . `instance_uuid` = `anon_1` .
`instances_uuid`
  LEFT OUTER JOIN `instance_extra` AS `instance_extra_1`
ON `instance_extra_1` . `instance_uuid` = `anon_1` . `instances_uuid`
  LEFT OUTER JOIN `instance_metadata` AS `instance_metadata_1`
ON `instance_metadata_1` . `instance_uuid` = `anon_1` . `instances_uuid`
   AND `instance_metadata_1` . `deleted` = ?

The instance table has a 1-to-many relationship to both "instance_metadata"
and "instance_system_metadata" tables, so the query is effectively
producing a cross join of both metadata tables.

To illustrate the impact of this query, I have an instance that has 2
records in "instance_metadata" and 5 records in "instance_system_metadata":

> select instance_uuid,`key`,value from instance_metadata where
instance_uuid = 'a6cf4a6a-effe-4438-9b7f-d61b23117b9b';
+--+---++
| instance_uuid| key   | value  |
+--+---++
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property1 | value1 |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property2 | value  |
+--+---++
2 rows in set (0.61 sec)

> select instance_uuid,`key`,valusystem_metadata where instance_uuid =
'a6cf4a6a-effe-4438-9b7f-d61b23117b9b';
++--+
| key| value|
++--+
| image_disk_format  | qcow2|
| image_min_ram  | 0|
| image_min_disk | 20   |
| image_base_image_ref   | 39cd564f-6a29-43e2-815b-62097968486a |
| image_container_format | bare |
++--+
5 rows in set (0.00 sec)

For this particular instance, the query used by the metadata API will fetch
10 records from the database:

+--+-+---++--+
| anon_1_instances_uuid| instance_metadata_1_key |
instance_metadata_1_value | instance_system_metadata_1_key |
instance_system_metadata_1_value |
+--+-+---++--+
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property1   | value1
 | image_disk_format  | qcow2
 |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property2   | value
  | image_disk_format  | qcow2
   |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property1   | value1
 | image_min_ram  | 0
 |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property2   | value
  | image_min_ram  | 0
   |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property1   | value1
 | image_min_disk | 20
  |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b |