Issue #2691 has been updated by Marc Fournier. Affected version changed from 0.25.1rc1 to 0.25.1rc2
I have more details about this issue. As Brice suggested, I had a look at the performance of mysql but didn't seem to be missing any indexes. Just to be sure, I dropped and recreated an empty database, and let all the clients rebuild the content. I then tried collecting all the resources on one node, but run in the same timeout/high load issue again. I also checked out 0.25.1rc2, but nothing magically happened in the meantime and solved the issue... I noticed that the SQL queries used to collect the resources were slightly different between my 0.24.8 server, and the 0.25.1rc2 one. On the 0.24.8 server, the query returns about 150 lines, one line per collected resource. The query sent to mysql with 0.25.1rc2 generates 202879 lines. When dumped to a text file, this is about 75 MB of data. I can imagine puppetmaster would have some trouble digesting all this, which could cause the timeouts and high cpu usage. This is the SQL generated on 0.24.8: <pre> SELECT `resources`.`id` AS t0_r0, `resources`.`title` AS t0_r1, `resources`.`restype` AS t0_r2, `resources`.`host_id` AS t0_r3, `resources`.`source_file_id` AS t0_r4, `resources`.`exported` AS t0_r5, `resources`.`line` AS t0_r6, `resources`.`updated_at` AS t0_r7, `resources`.`created_at` AS t0_r8, `param_values`.`id` AS t1_r0, `param_values`.`value` AS t1_r1, `param_values`.`param_name_id` AS t1_r2, `param_values`.`line` AS t1_r3, `param_values`.`resource_id` AS t1_r4, `param_values`.`updated_at` AS t1_r5, `param_values`.`created_at` AS t1_r6, `param_names`.`id` AS t2_r0, `param_names`.`name` AS t2_r1, `param_names`.`updated_at` AS t2_r2, `param_names`.`created_at` AS t2_r3 FROM `resources` LEFT OUTER JOIN `param_values` ON param_values.resource_id = resources.id LEFT OUTER JOIN `param_names` ON `param_names`.id = `param_values`.param_name_id WHERE (host_id != 281 AND (exported=1 AND restype='Nagios_service') AND (param_values.value = 'nagiosserver.domain.tld' and param_names.name = 'tag')) </pre> This is the SQL generated on 0.25.1rc2: <pre> SELECT `resources`.`id` AS t0_r0, `resources`.`title` AS t0_r1, `resources`.`restype` AS t0_r2, `resources`.`host_id` AS t0_r3, `resources`.`source_file_id` AS t0_r4, `resources`.`exported` AS t0_r5, `resources`.`line` AS t0_r6, `resources`.`updated_at` AS t0_r7, `resources`.`created_at` AS t0_r8, `param_values`.`id` AS t1_r0, `param_values`.`value` AS t1_r1, `param_values`.`param_name_id` AS t1_r2, `param_values`.`line` AS t1_r3, `param_values`.`resource_id` AS t1_r4, `param_values`.`updated_at` AS t1_r5, `param_values`.`created_at` AS t1_r6, `param_names`.`id` AS t2_r0, `param_names`.`name` AS t2_r1, `param_names`.`updated_at` AS t2_r2, `param_names`.`created_at` AS t2_r3, `puppet_tags`.`id` AS t3_r0, `puppet_tags`.`name` AS t3_r1, `puppet_tags`.`updated_at` AS t3_r2, `puppet_tags`.`created_at` AS t3_r3, `resource_tags_puppet_tags`.`id` AS t4_r0, `resource_tags_puppet_tags`.`resource_id` AS t4_r1, `resource_tags_puppet_tags`.`puppet_tag_id` AS t4_r2, `resource_tags_puppet_tags`.`updated_at` AS t4_r3, `resource_tags_puppet_tags`.`created_at` AS t4_r4 FROM `resources` LEFT OUTER JOIN `param_values` ON param_values.resource_id = resources.id LEFT OUTER JOIN `param_names` ON `param_names`.id = `param_values`.param_name_id LEFT OUTER JOIN `resource_tags` ON (`resources`.`id` = `resource_tags`.`resource_id`) LEFT OUTER JOIN `puppet_tags` ON (`puppet_tags`.`id` = `resource_tags`.`puppet_tag_id`) LEFT OUTER JOIN `resource_tags` resource_tags_puppet_tags ON resource_tags_puppet_tags.puppet_tag_id = puppet_tags.id WHERE (host_id != 6 AND (exported=1 AND restype='Nagios_service') AND (puppet_tags.name = 'nagiosserver.domain.tld')) </pre> I've tried running the first SQL query on 0.25.1rc2 server. It works fine and returns 80 exported resources. Running the 2nd query on the 0.24.8 server returns 2588158 lines/930MB. I'll have to postpone migrating this server to 0.25 :-) On the 0.25.1rc2 server, I have the following gems installed: activerecord (2.3.2) activesupport (2.3.2) fastthread (1.0.7) passenger (2.2.2) rack (1.0.0) rake (0.8.7) On the 0.24.8 server, I have exactly the same gems, except activesupport which isn't installed. Please let me know if I can give more details ! ---------------------------------------- Bug #2691: "Could not retrieve catalog: HTTP-Error: 500 Internal Server Error" with tagged exported resources http://projects.reductivelabs.com/issues/2691 Author: Marc Fournier Status: Needs more information Priority: Normal Assigned to: Category: Target version: Affected version: 0.25.1rc2 Keywords: Branch: The problem happens with this sort of manifest (in fact, this simple example works fine): <pre> node a { @@file { "/tmp/foo": content => "fjskfjs\n", tag => "foofile", } } node b { File <<| tag == 'foofile' |>> } </pre> In my case, I have a about 20 nodes, each one having: <pre> @@nagios_host { $fqdn: tag => "nagiosserver.domain.tld", ... } @@nagios_service { "a": tag => "nagiosserver.domain.tld", ... } @@nagios_service { "b": tag => "nagiosserver.domain.tld", ... } @@nagios_service { "c": tag => "nagiosserver.domain.tld", ... } @@nagios_service { "d": tag => "nagiosserver.domain.tld", ... } @@nagios_service { "e": tag => "nagiosserver.domain.tld", ... } </pre> I suspect the problem occurs when many exported resources are involved (if 20x5 can be considered many). The problem happens when running puppet on a node containing: <pre> Nagios_host <<| tag == "nagiosserver.domain.tld" |>> Nagios_service <<| tag == nagiosserver.domain.tld" |>> </pre> On the puppetmaster, one ruby process consumes 100% CPU during a few minutes, then the client says: <pre> err: Could not call puppetmaster.getconfig: #<RuntimeError: HTTP-Error: 500 Internal Server Error> err: Could not retrieve catalog: HTTP-Error: 500 Internal Server Error </pre> the ruby process continues running at 100%, and after a while we get this backtrace: <pre> [ pid=3338 file=ext/apache2/Hooks.cpp:547 time=2009-10-01 11:16:46.189 ]: Backend process 3465 did not return a valid HTTP response. It returned no data. *** Exception Errno::EPIPE in Passenger RequestHandler (Broken pipe) (process 3465): from /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/rack/request_handler.rb:83:in `write' from /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/rack/request_handler.rb:83:in `process_request' from /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/abstract_request_handler.rb:203:in `main_loop' from /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/rack/application_spawner.rb:110:in `run' from /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/rack/application_spawner.rb:67:in `spawn_application' from /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/utils.rb:181:in `safe_fork' from /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/rack/application_spawner.rb:60:in `spawn_application' from /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/rack/application_spawner.rb:45:in `spawn_application' from /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/spawn_manager.rb:158:in `spawn_application' from /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/spawn_manager.rb:282:in `handle_spawn_application' from /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/abstract_server.rb:337:in `__send__' from /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/abstract_server.rb:337:in `main_loop' from /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/abstract_server.rb:187:in `start_synchronously' from /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/bin/passenger-spawn-server:61 </pre> The problem can be produced with 0.24.8 and 0.25.0 clients, and 0.25.0 as well as 0.25.1rc1 server, with passenger and these gems installed: activerecord (2.3.2) activesupport (2.3.2) fastthread (1.0.7) passenger (2.2.2) rack (1.0.0) rake (0.8.7) It should be noted that this works fine: <pre> Nagios_host <<| |>> Nagios_service <<| |>> </pre> I've also noticed the following error from time to time on 0.25.0. But I'm unable to reproduce it, and never seen it with 0.25.1rc1, so I'm not sure it's related: <pre> /srv/puppet/lib/puppet/util/settings/file_setting.rb:19:in `group=' /srv/puppet/lib/puppet/util/settings/setting.rb:44:in `send' /srv/puppet/lib/puppet/util/settings/setting.rb:44:in `initialize' /srv/puppet/lib/puppet/util/settings/setting.rb:38:in `each' /srv/puppet/lib/puppet/util/settings/setting.rb:38:in `initialize' /srv/puppet/lib/puppet/util/settings.rb:398:in `new' /srv/puppet/lib/puppet/util/settings.rb:398:in `newsetting' /srv/puppet/lib/puppet/util/settings.rb:533:in `setdefaults' /srv/puppet/lib/puppet/util/settings.rb:518:in `each' /srv/puppet/lib/puppet/util/settings.rb:518:in `setdefaults' /srv/puppet/lib/puppet/reports/store.rb:15:in `mkclientdir' /srv/puppet/lib/puppet/reports/store.rb:35:in `process' /srv/puppet/lib/puppet/network/handler/report.rb:66:in `process' /srv/puppet/lib/puppet/network/handler/report.rb:59:in `each' /srv/puppet/lib/puppet/network/handler/report.rb:59:in `process' /srv/puppet/lib/puppet/network/handler/report.rb:33:in `report' /usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `to_proc' /srv/puppet/lib/puppet/network/xmlrpc/processor.rb:52:in `call' /srv/puppet/lib/puppet/network/xmlrpc/processor.rb:52:in `protect_service' /srv/puppet/lib/puppet/network/xmlrpc/processor.rb:85:in `setup_processor' /usr/lib/ruby/1.8/xmlrpc/server.rb:336:in `call' /usr/lib/ruby/1.8/xmlrpc/server.rb:336:in `dispatch' /usr/lib/ruby/1.8/xmlrpc/server.rb:323:in `each' /usr/lib/ruby/1.8/xmlrpc/server.rb:323:in `dispatch' /usr/lib/ruby/1.8/xmlrpc/server.rb:366:in `call_method' /usr/lib/ruby/1.8/xmlrpc/server.rb:378:in `handle' /srv/puppet/lib/puppet/network/xmlrpc/processor.rb:44:in `process' /srv/puppet/lib/puppet/network/http/rack/xmlrpc.rb:35:in `process' /srv/puppet/lib/puppet/network/http/rack.rb:48:in `call' /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/rack/request_handler.rb:81:in `process_request' /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/abstract_request_handler.rb:203:in `main_loop' /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/rack/application_spawner.rb:110:in `run' /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/rack/application_spawner.rb:67:in `spawn_application' /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/utils.rb:181:in `safe_fork' /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/rack/application_spawner.rb:60:in `spawn_application' /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/rack/application_spawner.rb:45:in `spawn_application' /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/spawn_manager.rb:158:in `spawn_application' /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/spawn_manager.rb:282:in `handle_spawn_application' /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/abstract_server.rb:337:in `__send__' /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/abstract_server.rb:337:in `main_loop' /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_passenger/abstract_server.rb:187:in `start_synchronously' /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/bin/passenger-spawn-server:61 </pre> -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here: http://reductivelabs.com/redmine/my/account --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Puppet Bugs" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/puppet-bugs?hl=en -~----------~----~----~----~------~----~------~--~---
