Issue #1812 has been updated by luke.

Status changed from Closed to Re-opened
Assigned to changed from jamtur01 to luke

This still needs a little work.
----------------------------------------
Bug #1812: YAML files corrupted on server (due to high load?)
http://projects.reductivelabs.com/issues/show/1812

Author: nigelk2
Status: Re-opened
Priority: Urgent
Assigned to: luke
Category: plumbing
Target version: 0.24.7
Complexity: Unknown
Affected version: 0.24.6
Keywords: 


>From peter's mail to puppet-dev


> Hi
> 
> it looks like it can happen that a node-yaml for a certain node gets
> broken. I had this now already a small amount of times and every time
> only a few (2-3) nodes were affected.
> 
> So whats the actual problem?
> 
> Suddenly I find Log entries like:
> 
> Tue Dec 09 15:34:27 +0100 2008 Puppet (err): Could not read YAML data
> for node foobar: syntax error on line 11, col 14: `  xen_domains: "3"'
> 
> in the puppetmaster.log and the master can't compile the node -> the
> node therefore won't get newer manifests, however it looks like the node
> itself gets in a corrupetd state and is unable to apply a cached manifest.
> 
> I can fix this problem by deleting the yaml file of that certain node in
> $puppetmaster_dir/yaml/node/ .
> 
> It often looks like that the master had a high load when this corrupt
> occurs. However I couldn't yet find a way to reproduce it, but from
> discussion in IRC it looks like other people also have randomly this
> problem. Randomly as it's not always the same node that has this problem
> and randomly that it happens very rarely.
> 
> So this looks certainly like a bug. However I was unsure if the data I
> gathered until now might be sufficient to file a bug. As well as I was
> in this more something-happens-magically-situation I'd rather like to
> investigate a bit more and maybe even come up with a solution or at
> least with an idea for a solution.
> 
> It looks like the yaml data got broken, as it might have happen due to
> the highload that there have been problems during the transmission or
> writing. Deleting the corrupt YAML file fixes the problem and as far as
> I saw it doesn't have any impact on the next run of the node.
> After examining the logs on the master and the client, it looks like the
> problem first occurs on the master. During the time it happened the
> first time it might be reasonable that the master had a very high load.
> 
> A solution I thought of might be to simply delete the yaml file on the
> master. The client could then exit with an error (like the present one)
> and if it rerun the next time everything would be fine.
> But this might be not the right way to fix. As I can't yet see when the
> yaml file is transferred, nor what the actual impact it has on compiling
> the manifest etc. I mean we could also simply delete it and restart
> again the client-run procedure (if that is possible), so we can fix the
> problem within a client-run (maybe with a max retries of 3).
> Another option might be to check if the yaml data get stored correctly
> and if not and if the yaml in the memory is still correct rewrite it,
> otherwise request it again from the client.
> Another idea I had is that it might be a problem in the yaml lib of ruby
> or whatever.
> 
> So do you guys think if this is certainly a bug and what would be the
> best location to look for the actual problem and what might be the best
> solution for it?
> 
> Testing the solution would be very easy: simply corrupt the yaml file
> and see if puppet behaves the expected way.
> However I'm yet really unsure how to reproduce the actual cause.
> 
> thanks for additional ideas or information. If I have a more concrete
> idea what might be the actual source of the problem and what might be
> the best way to fix the problem I'm more confident to file a bug.
> 
> cheers pete

Corroborated by myself and Oliver Hookins


----------------------------------------
You have received this notification because you have either subscribed to it, 
or are involved in it.
To change your notification preferences, please click here: 
http://reductivelabs.com/redmine/my/account

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Puppet Bugs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/puppet-bugs?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to