Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6
Title: Message Title Nacho Barrientos commented on PDB-4830 Re: Big performance impact when upgrading to PuppetDB5/6 It would have been nice if, after three years of basically zero communication and finally closing this issue as "won't fix", (at least) some pointers to the "current work" had been left in the ticket. I think it's the bare minimum. Add Comment This message was sent by Atlassian Jira (v8.20.21#820021-sha1:38274c8) -- You received this message because you are subscribed to the Google Groups "Puppet Bugs" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-bugs+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-bugs/JIRA.367361.1595856224000.1062.1687198740037%40Atlassian.JIRA.
Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6
Title: Message Title Margaret Lee updated an issue PuppetDB / PDB-4830 Big performance impact when upgrading to PuppetDB5/6 Change By: Margaret Lee Epic Link: PE-31891 Add Comment This message was sent by Atlassian Jira (v8.13.2#813002-sha1:c495a97) -- You received this message because you are subscribed to the Google Groups "Puppet Bugs" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-bugs+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-bugs/JIRA.367361.1595856224000.22477.1619735400230%40Atlassian.JIRA.
Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6
Title: Message Title Kamil Swoboda commented on PDB-4830 Re: Big performance impact when upgrading to PuppetDB5/6 Too bad Thanks for the replay though! If anything changes would be great if you'd post here Add Comment This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935) -- You received this message because you are subscribed to the Google Groups "Puppet Bugs" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-bugs+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-bugs/JIRA.367361.1595856224000.157256.1614854880035%40Atlassian.JIRA.
Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6
Title: Message Title Nacho Barrientos commented on PDB-4830 Re: Big performance impact when upgrading to PuppetDB5/6 Not at all, we're stuck in PuppetDB4 due to this Add Comment This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935) -- You received this message because you are subscribed to the Google Groups "Puppet Bugs" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-bugs+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-bugs/JIRA.367361.1595856224000.155793.1614756300023%40Atlassian.JIRA.
Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6
Title: Message Title Kamil Swoboda commented on PDB-4830 Re: Big performance impact when upgrading to PuppetDB5/6 Nacho Barrientos were you able to resolve this issue? We are facing exactly the same problem during migration from Puppet 4 to Puppet 7. Works fine on Puppetdb 4 and starting from Puppetdb5 query performance is terrible. My ticket is PDB-5051 would be really great if you've got some information how to fix this problem or if you've found some sort of a workaround Thanks! Add Comment This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935) -- You received this message because you are subscribed to the Google Groups "Puppet Bugs" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-bugs+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-bugs/JIRA.367361.1595856224000.154920.1614696480031%40Atlassian.JIRA.
Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6
Title: Message Title Nacho Barrientos commented on PDB-4830 Re: Big performance impact when upgrading to PuppetDB5/6 Thanks once again for your time. Is it important for your usecase that only the queries like /pdb/query/v4/facts// are performant, or do you also make use of queries that don't provide the value, like /pdb/query/v4/facts/? It'd be a good starting point but to be honest it would not be enough. Most of our queries (generated by the module above) talk directly to /pdb/query/v4/facts passing a query, for example: "Get the value of the ipaddress fact for nodes whose fact fact1 has x as value, fact fact2 has y and fact fact3 has z": via the query_nodes() function: query_nodes('fact1="x" and fact2="y" and fact3="z"', ipaddress) translates into: GET pdb/query/v4/facts --query '["extract",["value"],["and",["and",["in","certname",["extract","certname",["select_fact_contents",["and",["=","path",["fact2"]],["=","value","y"],["in","certname",["extract","certname",["select_fact_contents",["and",["=","path",["fact1"]],["=","value","x"],["in","certname",["extract","certname",["select_fact_contents",["and",["=","path",["fact3"]],["=","value","z"]],["or",["=","name","ipaddress"' # PuppetDB 4 real0m0.471s # PuppetDB 5 real0m16.302s
Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6
Title: Message Title Austin Blatt commented on PDB-4830 Re: Big performance impact when upgrading to PuppetDB5/6 Yeah, that makes sense. The facts endpoint does appear to be underperforming, and I haven't found any queries yet that would actually make use of the index idx_factsets_jsonb_merged, so I'm going to do some investigation in the git history and try to see if it was added because someone thought it would optimize queries like these, or for another reason. If we do go down the path of optimizing the /facts endpoint with the @> operator, it'll need to be for specific types of /facts queries as it's not as general as the function (which is just the #> operator). Is it important for your usecase that only the queries like /pdb/query/v4/facts// are performant, or do you also make use of queries that don't provide the value, like /pdb/query/v4/facts/ Add Comment This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935) -- You received this message because you are subscribed to the Google Groups "Puppet Bugs" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-bugs+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-bugs/JIRA.367361.1595856224000.4239.1596041220070%40Atlassian.JIRA.
Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6
Title: Message Title Nacho Barrientos commented on PDB-4830 Re: Big performance impact when upgrading to PuppetDB5/6 Hi Austin Blatt, Thanks for taking a look at the ticket and for coming back to me. For the use-case of retrieving arbitrary fact values for simple fact queries, using the inventory is indeed much faster: # PuppetDB 5 GET raw pdb/query/v4/facts/fqdn/node1.example.org "node1.example.org" 0m8.358s # PuppetDB 5 GET raw pdb/query/v4/inventory --query 'query=["=", "facts.fqdn", "node1.example.org"]' | jq '.[].certname' "node1.example.org" 0m0.236s which is comparable to the performance we have in production:
Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6
Title: Message Title Austin Blatt commented on PDB-4830 Re: Big performance impact when upgrading to PuppetDB5/6 Nacho Barrientos Have you checked out the inventory endpoint it often has better performance for single fact queries like this. An example inventory PQL query that should return return a similar set of nodes as the facts query above is inventory[] { facts. = "" } In PDB 6.7.0+ you can also restrict the facts that are returned with dot notation inventory[certname, facts.os.family] { facts. = "" } Add Comment This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)
Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6
Title: Message Title Nacho Barrientos updated an issue PuppetDB / PDB-4830 Big performance impact when upgrading to PuppetDB5/6 Change By: Nacho Barrientos Hi,We're preparing the upgrade of a rather big PuppetDB instance (~40k hosts) from PuppetDB4 to PuppetDB5.There's a type of query that we do rather often (directly to the API or via the PuppetDB query module when compiling catalogs) which is to select _certnames_ for which a certain fact has a given value. The simplest way to reproduce this scenario is to issue something like:{noformat}GET pdb/query/v4/facts/factname/factvalue{noformat}In the 'old' PuppetDB world (<=PuppetDB4), where the fact values were normalised in the _facts_ table we added an index like this:{noformat}"facts_value_string_idx" btree (value_string){noformat}which for a query on the database the size above for nodes with the fact 'fqdn' _fqdn_ set to 'node1 _node1 .example. org` org_ performed pretty well thanks to the index:{noformat}[ { "certname": "node1.example.org", "environment": "production", "name": "fqdn", "value": "node1.example.org" } ]{noformat}*real 0m0.319s*The index is used avoiding a full table scan and the performance is acceptable. We normally query for more complex fact combinations using several string fact values but for illustration purposes the above is good enough.However, in the new world (PuppetDB5 and beyond) it seems that the _facts_ table is gone and now the fact values are in two columns in the _factsets_ table (_stable_ and _volatile_) of type _jsonb_.With the same HW setup, PostgreSQL configuration, OS configuration and a copy of the data (we're populating both instances at the same time using an extra _submit_only_server_urls_ in the terminus), the query above produces the same results as expected but it is much slower:*real 0m7.403s*We could easily reproduce the (bad) ratio PDB4/PDB5 with queries of different complexities.Looking at the possible indices that this query could use we found this one:{noformat}"idx_factsets_jsonb_merged" gin ((stable || volatile) jsonb_path_ops){noformat}However, taking a look at the generated SQL, PuppetDB seems to be using _jsonb_extract_path()_ to then compare to the value being looked for. However, this approach cannot benefit from the index above.I have the whole query if needed but to illustrate the issue what I did was to analyze the query and extract the juiciest subquery that could show the problem which is:{noformat}explain analyze SELECT certname FROM factsets WHERE jsonb_extract_path(stable||volatile, 'fqdn') = '"node1.example.org"'::jsonb; QUERY PLAN-- Seq Scan on factsets (cost=0.00...63 rows=182 width=20) (actual time=2708.687..5292.499 rows=1 loops=1) Filter: (jsonb_extract_path((stable || volatile), VARIADIC '\{fqdn}'::text[]) = '"node1.example.org"'::jsonb) Rows Removed by Filter: 36492 Execution Time: 5292.528 ms{noformat}As you can see it's a full table scan.
Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6
Title: Message Title Nacho Barrientos updated an issue PuppetDB / PDB-4830 Big performance impact when upgrading to PuppetDB5/6 Change By: Nacho Barrientos Hi,We're preparing the upgrade of a rather big PuppetDB instance (~40k hosts) from PuppetDB4 to PuppetDB5.There's a type of query that we do rather often (directly to the API or via the PuppetDB query module when compiling catalogs) which is to select _certnames_ for which a certain fact has a given value. The simplest way to reproduce this scenario is to issue something like:{noformat}GET pdb/query/v4/facts/factname/factvalue{noformat}In the 'old' PuppetDB world (<=PuppetDB4), where the fact values were normalised in the _facts_ table we added an index like this:{noformat}"facts_value_string_idx" btree (value_string){noformat}which for a query on the database the size above for nodes with the fact 'fqdn' set to 'node1.example.org` performed pretty well thanks to the index:{noformat}[ { "certname": "node1. cern example . ch org ", "environment": "production", "name": "fqdn", "value": "node1. cern example . ch org " } ]{noformat}*real 0m0.319s*The index is used avoiding a full table scan and the performance is acceptable. We normally query for more complex fact combinations using several string fact values but for illustration purposes the above is good enough.However, in the new world (PuppetDB5 and beyond) it seems that the _facts_ table is gone and now the fact values are in two columns in the _factsets_ table (_stable_ and _volatile_) of type _jsonb_.With the same HW setup, PostgreSQL configuration, OS configuration and a copy of the data (we're populating both instances at the same time using an extra _submit_only_server_urls_ in the terminus), the query above produces the same results as expected but it is much slower:*real 0m7.403s*We could easily reproduce the (bad) ratio PDB4/PDB5 with queries of different complexities.Looking at the possible indices that this query could use we found this one:{noformat}"idx_factsets_jsonb_merged" gin ((stable || volatile) jsonb_path_ops){noformat}However, taking a look at the generated SQL, PuppetDB seems to be using _jsonb_extract_path()_ to then compare to the value being looked for. However, this approach cannot benefit from the index above.I have the whole query if needed but to illustrate the issue what I did was to analyze the query and extract the juiciest subquery that could show the problem which is:{noformat}explain analyze SELECT certname FROM factsets WHERE jsonb_extract_path(stable||volatile, 'fqdn') = '"node1.example.org"'::jsonb; QUERY PLAN-- Seq Scan on factsets (cost=0.00...63 rows=182 width=20) (actual time=2708.687..5292.499 rows=1 loops=1) Filter: (jsonb_extract_path((stable || volatile), VARIADIC '\{fqdn}'::text[]) = '"node1.example.org"'::jsonb) Rows Removed by Filter: 36492 Execution Time: 5292.528 ms{noformat}As you can see it's a full table scan. However,
Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6
Title: Message Title Nacho Barrientos updated an issue PuppetDB / PDB-4830 Big performance impact when upgrading to PuppetDB5/6 Change By: Nacho Barrientos Hi,We're preparing the upgrade of a rather big PuppetDB instance (~40k hosts) from PuppetDB4 to PuppetDB5.There's a type of query that we do rather often (directly to the API or via the PuppetDB query module when compiling catalogs) which is to select _certnames_ for which a certain fact has a given value. The simplest way to reproduce this scenario is to issue something like:{noformat}GET pdb/query/v4/facts/factname/factvalue{noformat}In the 'old' PuppetDB world (<=PuppetDB4), where the fact values were normalised in the _facts_ table we added an index like this:{noformat}"facts_value_string_idx" btree (value_string){noformat}which for a query on the database the size above for nodes with the fact 'fqdn' set to 'node1.example.org` performed pretty well thanks to the index:{noformat}[ { "certname": "node1.cern.ch", "environment": "production", "name": "fqdn", "value": "node1.cern.ch" } ]{noformat}*real 0m0.319s*The index is used avoiding a full table scan and the performance is acceptable. We normally query for more complex fact combinations using several string fact values but for illustration purposes the above is good enough.However, in the new world (PuppetDB5 and beyond) it seems that the _facts_ table is gone and now the fact values are in two columns in the _factsets_ table (_stable_ and _volatile_) of type _jsonb_.With the same HW setup, PostgreSQL configuration, OS configuration and a copy of the data (we're populating both instances at the same time using an extra _submit_only_server_urls_ in the terminus), the query above produces the same results as expected but it is much slower:*real 0m7.403s*We could easily reproduce the (bad) ratio PDB4/PDB5 with queries of different complexities.Looking at the possible indices that this query could use we found this one:{noformat}"idx_factsets_jsonb_merged" gin ((stable || volatile) jsonb_path_ops){noformat}However, taking a look at the generated SQL, PuppetDB seems to be using _jsonb_extract_path()_ to then compare to the value being looked for. However, this approach cannot benefit from the index above.I have the whole query if needed but to illustrate the issue what I did was to analyze the query and extract the juiciest subquery that could show the problem which is:{noformat}explain analyze SELECT certname FROM factsets WHERE jsonb_extract_path(stable||volatile, 'fqdn') = '"node1.example.org"'::jsonb; QUERY PLAN-- Seq Scan on factsets (cost=0.00...63 rows=182 width=20) (actual time=2708.687..5292.499 rows=1 loops=1) Filter: (jsonb_extract_path((stable || volatile), VARIADIC '\{fqdn}'::text[]) = '"node1.example.org"'::jsonb) Rows Removed by Filter: 36492 Execution Time: 5292.528 ms{noformat}As you can see it's a full table scan. However, if the query
Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6
Title: Message Title Nacho Barrientos updated an issue PuppetDB / PDB-4830 Big performance impact when upgrading to PuppetDB5/6 Change By: Nacho Barrientos Hi,We're preparing the upgrade of a rather big PuppetDB instance (~40k hosts) from PuppetDB4 to PuppetDB5.There's a type of query that we do rather often (directly to the API or via the PuppeDB PuppetDB query module when compiling catalogs) which is to select _certnames_ forwhich a certain fact has a given value. The simplest way to reproduce this scenario is to issue something like:{noformat}GET pdb/query/v4/facts/factname/factvalue{noformat}In the 'old' PuppetDB world (<=PuppetDB4), where the fact values were normalised in the _facts_ table we added an index like this:{noformat}"facts_value_string_idx" btree (value_string){noformat}which for a query on the database the size above for nodes with the fact 'fqdn' set to 'node1.example.org` performed pretty well thanks to the index:{noformat}[ { "certname": "node1.cern.ch", "environment": "production", "name": "fqdn", "value": "node1.cern.ch" } ]{noformat}*real 0m0.319s*The index is used avoiding a full table scan and the performance is acceptable. We normally query for more complex fact combinations using several string fact values but for illustration purposes the above is good enough.However, in the new world (PuppetDB5 and beyond) it seems that the _facts_ table is gone and now the fact values are in two columns in the _factsets_ table (_stable_ and _volatile_) of type _jsonb_.With the same HW setup, PostgreSQL configuration, OS configuration and a copy of the data (we're populating both instances at the same time using an extra _submit_only_server_urls_ in the terminus), the query above produces the same results as expected but it is much slower:*real 0m7.403s*We could easily reproduce the (bad) ratio PDB4/PDB5 with queries of different complexities.Looking at the possible indices that this query could use we found this one:{noformat}"idx_factsets_jsonb_merged" gin ((stable || volatile) jsonb_path_ops){noformat}However, taking a look at the generated SQL, PuppetDB seems to be using _jsonb_extract_path()_ to then compare to the value being looked for. However, this approach cannot benefit from the index above.I have the whole query if needed but to illustrate the issue what I did was to analyze the query and extract the juiciest subquery that could show the problem which is:{noformat}explain analyze SELECT certname FROM factsets WHERE jsonb_extract_path(stable||volatile, 'fqdn') = '"node1.example.org"'::jsonb; QUERY PLAN-- Seq Scan on factsets (cost=0.00...63 rows=182 width=20) (actual time=2708.687..5292.499 rows=1 loops=1) Filter: (jsonb_extract_path((stable || volatile), VARIADIC '\{fqdn}'::text[]) = '"node1.example.org"'::jsonb) Rows Removed by Filter: 36492 Execution Time: 5292.528 ms{noformat}As you can see it's a full table scan. However, if
Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6
Title: Message Title Nacho Barrientos created an issue PuppetDB / PDB-4830 Big performance impact when upgrading to PuppetDB5/6 Issue Type: Bug Assignee: Unassigned Created: 2020/07/27 6:23 AM Priority: Normal Reporter: Nacho Barrientos Hi, We're preparing the upgrade of a rather big PuppetDB instance (~40k hosts) from PuppetDB4 to PuppetDB5. There's a type of query that we do rather often (directly to the API or via the PuppeDB query module when compiling catalogs) which is to select certnames for which a certain fact has a given value. The simplest way to reproduce this scenario is to issue something like: GET pdb/query/v4/facts/factname/factvalue In the 'old' PuppetDB world (<=PuppetDB4), where the fact values were normalised in the facts table we added an index like this: "facts_value_string_idx" btree (value_string) which for a query on the database the size above for nodes with the fact 'fqdn' set to 'node1.example.org` performed pretty well thanks to the index: