I deployed the open source puppet-dashboard 2.0.0 this past weekend for our 
production environment.  I did a fair amount of testing in the lab to 
ensure I had the deployment down, and I deployed as a passenger service 
knowing that we have a large environment and that webrick wasn't likely to 
cut it.  Overall, it appears to be working and behaving reasonably--I get 
the summary run status graph, etc, the rest of the UI.  Load average on the 
box is high-ish but nothing unreasonable, and I certainly appear to have 
headroom in memory and CPU.

However, when I click the "export nodes as CSV" link, it runs forever 
(Hasn't stopped yet).  

I looked into what the database was doing and it appears to be looping over 
some unknown number of report_ids, doing 

    7172 | dashboard | SELECT COUNT(*) FROM "resource_statuses"  WHERE 
"resource_statuses"."report_id" = 39467 AND "resource_statuses"."failed" = 
'f' AND (
IN ( | 00:00:15.575955
                     :           SELECT resource_statuses.id FROM 
resource_statuses

                     :             INNER JOIN resource_events ON 
resource_statuses.id = resource_events.resource_status_id

                     :             WHERE resource_events.status = 'noop'

                     :         )

                     : )



I ran the inner join by hand and it takes roughly 2 - 3 minutes each time.  
The overall query appears to be running 8 minutes per report ID.

I've done a few things to tweak postgresql before this--it could have been 
running longer earlier when I first noticed the problem.

I increased checkpoint segments to 32 from the default of 3, the 
checkpoint_completion_target to 0.9 from the default of 0.5, and to be able 
to observe what's going on I set stats_command_string to on.

Some other details: we have 3400 nodes (dashboard is only seeing 3290 or 
so, which is part of why I want this CSV report to determine why it's a 
smaller number).  This postgresql instance is also the instance supporting 
puppetdb, though obviously a separate database.  The resource statuses 
table has 47 million rows right now, and the inner join returns 4.3 million.

I'm curious if anyone else is running this version on postgresql with a 
large environment and if there are places I ought to be looking to tune 
this so it will run faster, or if I need to be doing something to shrink 
those tables without losing information, etc.

Thanks

Pete

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/d2909399-b071-43e4-9ad8-2c9d6cbc170c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to