Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6

2023-06-19 Thread 'Nacho Barrientos (Jira)' via Puppet Bugs
Title: Message Title


 
 
 
 

 
 
 

 
   
 Nacho Barrientos commented on  PDB-4830  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Big performance impact when upgrading to PuppetDB5/6   
 

  
 
 
 
 

 
 It would have been nice if, after three years of basically zero communication and finally closing this issue as "won't fix", (at least) some pointers to the "current work" had been left in the ticket. I think it's the bare minimum.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.21#820021-sha1:38274c8)  
 
 

 
   
 

  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Puppet Bugs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-bugs+unsubscr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-bugs/JIRA.367361.1595856224000.1062.1687198740037%40Atlassian.JIRA.


Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6

2021-04-29 Thread Margaret Lee (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Margaret Lee updated an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 PuppetDB /  PDB-4830  
 
 
  Big performance impact when upgrading to PuppetDB5/6   
 

  
 
 
 
 

 
Change By: 
 Margaret Lee  
 
 
Epic Link: 
 PE-31891  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.13.2#813002-sha1:c495a97)  
 
 

 
   
 

  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Puppet Bugs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-bugs+unsubscr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-bugs/JIRA.367361.1595856224000.22477.1619735400230%40Atlassian.JIRA.


Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6

2021-03-04 Thread Kamil Swoboda (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Kamil Swoboda commented on  PDB-4830  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Big performance impact when upgrading to PuppetDB5/6   
 

  
 
 
 
 

 
 Too bad  Thanks for the replay though! If anything changes would be great if you'd post here    
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)  
 
 

 
   
 

  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Puppet Bugs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-bugs+unsubscr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-bugs/JIRA.367361.1595856224000.157256.1614854880035%40Atlassian.JIRA.


Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6

2021-03-02 Thread Nacho Barrientos (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Nacho Barrientos commented on  PDB-4830  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Big performance impact when upgrading to PuppetDB5/6   
 

  
 
 
 
 

 
 Not at all, we're stuck in PuppetDB4 due to this   
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)  
 
 

 
   
 

  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Puppet Bugs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-bugs+unsubscr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-bugs/JIRA.367361.1595856224000.155793.1614756300023%40Atlassian.JIRA.


Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6

2021-03-02 Thread Kamil Swoboda (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Kamil Swoboda commented on  PDB-4830  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Big performance impact when upgrading to PuppetDB5/6   
 

  
 
 
 
 

 
 Nacho Barrientos were you able to resolve this issue? We are facing exactly the same problem during migration from Puppet 4 to Puppet 7. Works fine on Puppetdb 4 and starting from Puppetdb5 query performance is terrible. My ticket is PDB-5051 would be really great if you've got some information how to fix this problem or if you've found some sort of a workaround  Thanks!   
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)  
 
 

 
   
 

  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Puppet Bugs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-bugs+unsubscr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-bugs/JIRA.367361.1595856224000.154920.1614696480031%40Atlassian.JIRA.


Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6

2020-07-30 Thread Nacho Barrientos (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Nacho Barrientos commented on  PDB-4830  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Big performance impact when upgrading to PuppetDB5/6   
 

  
 
 
 
 

 
 Thanks once again for your time. 

Is it important for your usecase that only the queries like /pdb/query/v4/facts// are performant, or do you also make use of queries that don't provide the value, like /pdb/query/v4/facts/?
 It'd be a good starting point but to be honest it would not be enough. Most of our queries (generated by the module above) talk directly to /pdb/query/v4/facts passing a query, for example: "Get the value of the ipaddress fact for nodes whose fact fact1 has x as value, fact fact2 has y and fact fact3 has z": via the query_nodes() function:  
 
 
 
 
 query_nodes('fact1="x" and fact2="y" and fact3="z"',  ipaddress)
  
 
 
 
  translates into:  
 
 
 
 
 GET pdb/query/v4/facts --query  
 
 
 '["extract",["value"],["and",["and",["in","certname",["extract","certname",["select_fact_contents",["and",["=","path",["fact2"]],["=","value","y"],["in","certname",["extract","certname",["select_fact_contents",["and",["=","path",["fact1"]],["=","value","x"],["in","certname",["extract","certname",["select_fact_contents",["and",["=","path",["fact3"]],["=","value","z"]],["or",["=","name","ipaddress"'  
 
 
 # PuppetDB 4  
 
 
 real0m0.471s  
 
 
 # PuppetDB 5  
 
 
 real0m16.302s
  
 
 
 
  

Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6

2020-07-29 Thread Austin Blatt (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Austin Blatt commented on  PDB-4830  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Big performance impact when upgrading to PuppetDB5/6   
 

  
 
 
 
 

 
 Yeah, that makes sense. The facts endpoint does appear to be underperforming, and I haven't found any queries yet that would actually make use of the index idx_factsets_jsonb_merged, so I'm going to do some investigation in the git history and try to see if it was added because someone thought it would optimize queries like these, or for another reason. If we do go down the path of optimizing the /facts endpoint with the @> operator, it'll need to be for specific types of /facts queries as it's not as general as the function (which is just the #> operator). Is it important for your usecase that only the queries like /pdb/query/v4/facts// are performant, or do you also make use of queries that don't provide the value, like /pdb/query/v4/facts/  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)  
 
 

 
   
 

  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Puppet Bugs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-bugs+unsubscr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-bugs/JIRA.367361.1595856224000.4239.1596041220070%40Atlassian.JIRA.


Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6

2020-07-29 Thread Nacho Barrientos (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Nacho Barrientos commented on  PDB-4830  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Big performance impact when upgrading to PuppetDB5/6   
 

  
 
 
 
 

 
 Hi Austin Blatt, Thanks for taking a look at the ticket and for coming back to me. For the use-case of retrieving arbitrary fact values for simple fact queries, using the inventory is indeed much faster:  
 
 
 
 
 # PuppetDB 5  
 
 
 GET raw pdb/query/v4/facts/fqdn/node1.example.org  
 
 
 "node1.example.org"  
 
 
    
 
 
 0m8.358s
  
 
 
 
   
 
 
 
 
 # PuppetDB 5  
 
 
 GET raw pdb/query/v4/inventory --query 'query=["=", "facts.fqdn", "node1.example.org"]' | jq '.[].certname'  
 
 
 "node1.example.org"  
 
 
    
 
 
 0m0.236s
  
 
 
 
  which is comparable to the performance we have in production:  
 
 

Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6

2020-07-28 Thread Austin Blatt (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Austin Blatt commented on  PDB-4830  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Big performance impact when upgrading to PuppetDB5/6   
 

  
 
 
 
 

 
 Nacho Barrientos Have you checked out the inventory endpoint it often has better performance for single fact queries like this. An example inventory PQL query that should return return a similar set of nodes as the facts query above is  
 
 
 
 
 inventory[] { facts. = "" }
  
 
 
 
  In PDB 6.7.0+ you can also restrict the facts that are returned with dot notation  
 
 
 
 
 inventory[certname, facts.os.family] { facts. = "" }
  
 
 
 
   
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)  
 
 

 
   
 

  
 
   

Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6

2020-07-27 Thread Nacho Barrientos (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Nacho Barrientos updated an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 PuppetDB /  PDB-4830  
 
 
  Big performance impact when upgrading to PuppetDB5/6   
 

  
 
 
 
 

 
Change By: 
 Nacho Barrientos  
 

  
 
 
 
 

 
 Hi,We're preparing the upgrade of a rather big PuppetDB instance (~40k hosts) from PuppetDB4 to PuppetDB5.There's a type of query that we do rather often (directly to the API or via the PuppetDB query module when compiling catalogs) which is to select _certnames_ for which a certain fact has a given value. The simplest way to reproduce this scenario is to issue something like:{noformat}GET pdb/query/v4/facts/factname/factvalue{noformat}In the 'old' PuppetDB world (<=PuppetDB4), where the fact values were normalised in the _facts_ table we added an index like this:{noformat}"facts_value_string_idx" btree (value_string){noformat}which for a query on the database the size above for nodes with the fact  'fqdn'  _fqdn_  set  to  'node1  _node1 .example. org` org_  performed pretty well thanks to the index:{noformat}[ {  "certname": "node1.example.org", "environment": "production", "name": "fqdn", "value": "node1.example.org" } ]{noformat}*real 0m0.319s*The index is used avoiding a full table scan and the performance is acceptable. We normally query for more complex fact combinations using several string fact values but for illustration purposes the above is good enough.However, in the new world (PuppetDB5 and beyond) it seems that the _facts_ table is gone and now the fact values are in two columns in the _factsets_ table (_stable_ and _volatile_) of type _jsonb_.With the same HW setup, PostgreSQL configuration, OS configuration and a copy of the data (we're populating both instances at the same time using an extra _submit_only_server_urls_ in the terminus), the query above produces the same results as expected but it is much slower:*real 0m7.403s*We could easily reproduce the (bad) ratio PDB4/PDB5 with queries of different complexities.Looking at the possible indices that this query could use we found this one:{noformat}"idx_factsets_jsonb_merged" gin ((stable || volatile) jsonb_path_ops){noformat}However, taking a look at the generated SQL, PuppetDB seems to be using _jsonb_extract_path()_ to then compare to the value being looked for. However, this approach cannot benefit from the index above.I have the whole query if needed but to illustrate the issue what I did was to analyze the query and extract the juiciest subquery that could show the problem which is:{noformat}explain analyze SELECT certname FROM factsets WHERE jsonb_extract_path(stable||volatile, 'fqdn') = '"node1.example.org"'::jsonb; QUERY PLAN-- Seq Scan on factsets (cost=0.00...63 rows=182 width=20) (actual time=2708.687..5292.499 rows=1 loops=1) Filter: (jsonb_extract_path((stable || volatile), VARIADIC '\{fqdn}'::text[]) = '"node1.example.org"'::jsonb) Rows Removed by Filter: 36492 Execution Time: 5292.528 ms{noformat}As you can see it's a full table scan. 

Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6

2020-07-27 Thread Nacho Barrientos (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Nacho Barrientos updated an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 PuppetDB /  PDB-4830  
 
 
  Big performance impact when upgrading to PuppetDB5/6   
 

  
 
 
 
 

 
Change By: 
 Nacho Barrientos  
 

  
 
 
 
 

 
 Hi,We're preparing the upgrade of a rather big PuppetDB instance (~40k hosts) from PuppetDB4 to PuppetDB5.There's a type of query that we do rather often (directly to the API or via the PuppetDB query module when compiling catalogs) which is to select _certnames_ for which a certain fact has a given value. The simplest way to reproduce this scenario is to issue something like:{noformat}GET pdb/query/v4/facts/factname/factvalue{noformat}In the 'old' PuppetDB world (<=PuppetDB4), where the fact values were normalised in the _facts_ table we added an index like this:{noformat}"facts_value_string_idx" btree (value_string){noformat}which for a query on the database the size above for nodes with the fact 'fqdn' set  to 'node1.example.org` performed pretty well thanks to the index:{noformat}[ {  "certname": "node1. cern example . ch org ", "environment": "production", "name": "fqdn", "value": "node1. cern example . ch org " } ]{noformat}*real 0m0.319s*The index is used avoiding a full table scan and the performance is acceptable. We normally query for more complex fact combinations using several string fact values but for illustration purposes the above is good enough.However, in the new world (PuppetDB5 and beyond) it seems that the _facts_ table is gone and now the fact values are in two columns in the _factsets_ table (_stable_ and _volatile_) of type _jsonb_.With the same HW setup, PostgreSQL configuration, OS configuration and a copy of the data (we're populating both instances at the same time using an extra _submit_only_server_urls_ in the terminus), the query above produces the same results as expected but it is much slower:*real 0m7.403s*We could easily reproduce the (bad) ratio PDB4/PDB5 with queries of different complexities.Looking at the possible indices that this query could use we found this one:{noformat}"idx_factsets_jsonb_merged" gin ((stable || volatile) jsonb_path_ops){noformat}However, taking a look at the generated SQL, PuppetDB seems to be using _jsonb_extract_path()_ to then compare to the value being looked for. However, this approach cannot benefit from the index above.I have the whole query if needed but to illustrate the issue what I did was to analyze the query and extract the juiciest subquery that could show the problem which is:{noformat}explain analyze SELECT certname FROM factsets WHERE jsonb_extract_path(stable||volatile, 'fqdn') = '"node1.example.org"'::jsonb; QUERY PLAN-- Seq Scan on factsets (cost=0.00...63 rows=182 width=20) (actual time=2708.687..5292.499 rows=1 loops=1) Filter: (jsonb_extract_path((stable || volatile), VARIADIC '\{fqdn}'::text[]) = '"node1.example.org"'::jsonb) Rows Removed by Filter: 36492 Execution Time: 5292.528 ms{noformat}As you can see it's a full table scan. However, 

Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6

2020-07-27 Thread Nacho Barrientos (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Nacho Barrientos updated an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 PuppetDB /  PDB-4830  
 
 
  Big performance impact when upgrading to PuppetDB5/6   
 

  
 
 
 
 

 
Change By: 
 Nacho Barrientos  
 

  
 
 
 
 

 
 Hi,We're preparing the upgrade of a rather big PuppetDB instance (~40k hosts) from PuppetDB4 to PuppetDB5.There's a type of query that we do rather often (directly to the API or via the PuppetDB query module when compiling catalogs) which is to select _certnames_ for which a certain fact has a given value. The simplest way to reproduce this scenario is to issue something like:{noformat}GET pdb/query/v4/facts/factname/factvalue{noformat}In the 'old' PuppetDB world (<=PuppetDB4), where the fact values were normalised in the _facts_ table we added an index like this:{noformat}"facts_value_string_idx" btree (value_string){noformat}which for a query on the database the size above for nodes with the fact 'fqdn' set  to 'node1.example.org` performed pretty well thanks to the index:{noformat}[ {  "certname": "node1.cern.ch", "environment": "production", "name": "fqdn", "value": "node1.cern.ch" } ]{noformat}*real 0m0.319s*The index is used avoiding a full table scan and the performance is acceptable. We normally query for more complex fact combinations using several string fact values but for illustration purposes the above is good enough.However, in the new world (PuppetDB5 and beyond) it seems that the _facts_ table is gone and now the fact values are in two columns in the _factsets_ table (_stable_ and _volatile_) of type _jsonb_.With the same HW setup, PostgreSQL configuration, OS configuration and a copy of the data (we're populating both instances at the same time using an extra _submit_only_server_urls_ in the terminus), the query above produces the same   results as expected but it is much slower:*real 0m7.403s*We could easily reproduce the (bad) ratio PDB4/PDB5 with queries of different complexities.Looking at the possible indices that this query could use we found this one:{noformat}"idx_factsets_jsonb_merged" gin ((stable || volatile) jsonb_path_ops){noformat}However, taking a look at the generated SQL, PuppetDB seems to be using   _jsonb_extract_path()_ to then compare to the value being looked for. However,   this approach cannot benefit from the index above.I have the whole query if needed but to illustrate the issue what I did was to   analyze the query and extract the juiciest subquery that could show the   problem which is:{noformat}explain analyze SELECT certname FROM factsets WHERE jsonb_extract_path(stable||volatile, 'fqdn') = '"node1.example.org"'::jsonb; QUERY PLAN-- Seq Scan on factsets (cost=0.00...63 rows=182 width=20) (actual time=2708.687..5292.499 rows=1 loops=1) Filter: (jsonb_extract_path((stable || volatile), VARIADIC '\{fqdn}'::text[]) = '"node1.example.org"'::jsonb) Rows Removed by Filter: 36492 Execution Time: 5292.528 ms{noformat}As you can see it's a full table scan. However, if the query 

Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6

2020-07-27 Thread Nacho Barrientos (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Nacho Barrientos updated an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 PuppetDB /  PDB-4830  
 
 
  Big performance impact when upgrading to PuppetDB5/6   
 

  
 
 
 
 

 
Change By: 
 Nacho Barrientos  
 

  
 
 
 
 

 
 Hi,We're preparing the upgrade of a rather big PuppetDB instance (~40k hosts) from   PuppetDB4 to PuppetDB5.There's a type of query that we do rather often (directly to the API or via the   PuppeDB  PuppetDB  query module when compiling catalogs) which is to select _certnames_ forwhich a certain fact has a given value. The simplest way to reproduce this   scenario is to issue something like:{noformat}GET pdb/query/v4/facts/factname/factvalue{noformat}In the 'old' PuppetDB world (<=PuppetDB4), where the fact values were normalised in the _facts_ table we added an index like this:{noformat}"facts_value_string_idx" btree (value_string){noformat}which for a query on the database the size above for nodes with the fact 'fqdn' set      to 'node1.example.org` performed pretty well thanks to the index:{noformat}[ {  "certname": "node1.cern.ch", "environment": "production", "name": "fqdn", "value": "node1.cern.ch" } ]{noformat}*real 0m0.319s*The index is used avoiding a full table scan and the performance is acceptable. We normally query for more complex fact combinations using several string fact values but for illustration purposes the above is good enough.However, in the new world (PuppetDB5 and beyond) it seems that the _facts_ table is gone and now the fact values are in two columns in the _factsets_ table (_stable_ and _volatile_) of   type _jsonb_.With the same HW setup, PostgreSQL configuration, OS configuration and a copy   of the data (we're populating both instances at the same time using an extra   _submit_only_server_urls_ in the terminus), the query above produces the same results as expected but it is much slower:*real 0m7.403s*We could easily reproduce the (bad) ratio PDB4/PDB5 with queries of different complexities.Looking at the possible indices that this query could use we found this one:{noformat}"idx_factsets_jsonb_merged" gin ((stable || volatile) jsonb_path_ops){noformat}However, taking a look at the generated SQL, PuppetDB seems to be using _jsonb_extract_path()_ to then compare to the value being looked for. However, this approach cannot benefit from the index above.I have the whole query if needed but to illustrate the issue what I did was to analyze the query and extract the juiciest subquery that could show the problem which is:{noformat}explain analyze SELECT certname FROM factsets WHERE jsonb_extract_path(stable||volatile, 'fqdn') = '"node1.example.org"'::jsonb; QUERY PLAN-- Seq Scan on factsets (cost=0.00...63 rows=182 width=20) (actual time=2708.687..5292.499 rows=1 loops=1) Filter: (jsonb_extract_path((stable || volatile), VARIADIC '\{fqdn}'::text[]) = '"node1.example.org"'::jsonb) Rows Removed by Filter: 36492 Execution Time: 5292.528 ms{noformat}As you can see it's a full table scan. However, if 

Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6

2020-07-27 Thread Nacho Barrientos (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Nacho Barrientos created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 PuppetDB /  PDB-4830  
 
 
  Big performance impact when upgrading to PuppetDB5/6   
 

  
 
 
 
 

 
Issue Type: 
  Bug  
 
 
Assignee: 
 Unassigned  
 
 
Created: 
 2020/07/27 6:23 AM  
 
 
Priority: 
  Normal  
 
 
Reporter: 
 Nacho Barrientos  
 

  
 
 
 
 

 
 Hi, We're preparing the upgrade of a rather big PuppetDB instance (~40k hosts) from PuppetDB4 to PuppetDB5. There's a type of query that we do rather often (directly to the API or via the  PuppeDB query module when compiling catalogs) which is to select certnames for  which a certain fact has a given value. The simplest way to reproduce this scenario is to issue something like:  
 
 
 
 
 GET pdb/query/v4/facts/factname/factvalue
  
 
 
 
  In the 'old' PuppetDB world (<=PuppetDB4), where the fact values were normalised in the facts table we added an index like this:  
 
 
 
 
 "facts_value_string_idx" btree (value_string)
  
 
 
 
  which for a query on the database the size above for nodes with the fact 'fqdn' set  to 'node1.example.org` performed pretty well thanks to the index: