Issue #1812 has been updated by luke. Status changed from Closed to Re-opened Assigned to changed from jamtur01 to luke
This still needs a little work. ---------------------------------------- Bug #1812: YAML files corrupted on server (due to high load?) http://projects.reductivelabs.com/issues/show/1812 Author: nigelk2 Status: Re-opened Priority: Urgent Assigned to: luke Category: plumbing Target version: 0.24.7 Complexity: Unknown Affected version: 0.24.6 Keywords: >From peter's mail to puppet-dev > Hi > > it looks like it can happen that a node-yaml for a certain node gets > broken. I had this now already a small amount of times and every time > only a few (2-3) nodes were affected. > > So whats the actual problem? > > Suddenly I find Log entries like: > > Tue Dec 09 15:34:27 +0100 2008 Puppet (err): Could not read YAML data > for node foobar: syntax error on line 11, col 14: ` xen_domains: "3"' > > in the puppetmaster.log and the master can't compile the node -> the > node therefore won't get newer manifests, however it looks like the node > itself gets in a corrupetd state and is unable to apply a cached manifest. > > I can fix this problem by deleting the yaml file of that certain node in > $puppetmaster_dir/yaml/node/ . > > It often looks like that the master had a high load when this corrupt > occurs. However I couldn't yet find a way to reproduce it, but from > discussion in IRC it looks like other people also have randomly this > problem. Randomly as it's not always the same node that has this problem > and randomly that it happens very rarely. > > So this looks certainly like a bug. However I was unsure if the data I > gathered until now might be sufficient to file a bug. As well as I was > in this more something-happens-magically-situation I'd rather like to > investigate a bit more and maybe even come up with a solution or at > least with an idea for a solution. > > It looks like the yaml data got broken, as it might have happen due to > the highload that there have been problems during the transmission or > writing. Deleting the corrupt YAML file fixes the problem and as far as > I saw it doesn't have any impact on the next run of the node. > After examining the logs on the master and the client, it looks like the > problem first occurs on the master. During the time it happened the > first time it might be reasonable that the master had a very high load. > > A solution I thought of might be to simply delete the yaml file on the > master. The client could then exit with an error (like the present one) > and if it rerun the next time everything would be fine. > But this might be not the right way to fix. As I can't yet see when the > yaml file is transferred, nor what the actual impact it has on compiling > the manifest etc. I mean we could also simply delete it and restart > again the client-run procedure (if that is possible), so we can fix the > problem within a client-run (maybe with a max retries of 3). > Another option might be to check if the yaml data get stored correctly > and if not and if the yaml in the memory is still correct rewrite it, > otherwise request it again from the client. > Another idea I had is that it might be a problem in the yaml lib of ruby > or whatever. > > So do you guys think if this is certainly a bug and what would be the > best location to look for the actual problem and what might be the best > solution for it? > > Testing the solution would be very easy: simply corrupt the yaml file > and see if puppet behaves the expected way. > However I'm yet really unsure how to reproduce the actual cause. > > thanks for additional ideas or information. If I have a more concrete > idea what might be the actual source of the problem and what might be > the best way to fix the problem I'm more confident to file a bug. > > cheers pete Corroborated by myself and Oliver Hookins ---------------------------------------- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here: http://reductivelabs.com/redmine/my/account --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Puppet Bugs" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/puppet-bugs?hl=en -~----------~----~----~----~------~----~------~--~---
