Deepak, thank you very much for that detail. it certainly clears up some things for me. I am guessing that i might have an exec or something somewhere (complete conjecture at this point) on a machine or two that might be leading to my low %age on the resource duplication since my catalog %age is at 96%...
David, completely possible that there is a bug in my manifest somewhere, my coworker has been doing most of the work getting our puppet system set up this way and i am working my way in to it, so i will certainly be trying to see if there is duplication that needs not be there, however, i do know that we have a pretty sizable number of check (at least in my opinion for the number of servers) for a total of 14 nodes (about 1100 active checks and 300 passive). On Thu, Sep 26, 2013 at 2:27 PM, Deepak Giridharagopal < [email protected]> wrote: > > On Sep 26, 2013, at 8:20 AM, Matthew Arguin <[email protected]> > wrote: > > So my reasoning behind the initial question/post again is due largely to > being unfamiliar with puppetdb i would say. We do export a lot of > resources in our puppet deployment due to the nagios checks. In poking > around on the groups, i came across this post: > https://groups.google.com/forum/#!topic/puppet-users/z1kjqwko1iA > > i was especially interested in the comment posted by windowsrefund at the > bottom and trying to understand that because it seems like he is saying > that i could reduce the amount of duplication of exported resources, but i > am not entirely sure. > > Basic questions: Is it "bad" to have resource duplication? Is it "good" > to have catalog duplication? Should i just forget about the 20000 default > on the query param or should i be aiming to tune my puppet deployment to > work towards that? (currently set to 50000 to stop the issue). > > > A few definitions that may help (I should really add this to the FAQ!): > > A resource is considered "duplicated" if it exists, identically, on more > than one system. More specifically: if a resource with the same type, > title, parameters, and other metadata exists on more than one node in > PuppetDB then that resource is considered one that is duplicated. So a > resource duplication rate of, say, 40% means that 60% of your resources > exist only on one system. I like to think of this as the "snowflake > quotient"...it's a measurement of how many of your resources are unique and > beautiful snowflakes. > > A catalog is considered "duplicated" if it's identical to the previous > catalog that PuppetDB has stored. So if you have a node foo.com, run > puppet on it twice, and the catalog hasn't changed for that system (you > haven't made a config change that affects that system between runs) then > that's considered a catalog duplicate. > > Internally, PuppetDB uses both of these concepts to improve performance. > If a new catalog is exactly the same as the previously stored one for a > node, then there's no need to use up IO to store it again. Similarly, if a > catalog contains 90% the same resources that already exist on other nodes, > PuppetDB doesn't need to store those resources either (rather we can just > store pointers to already-existing data in the database). > > Now, are the numbers you posted good/bad? In the field, we overwhelmingly > see resource duplication and catalog duplication in the 85-95% range. So > I'd say that your low resource duplication rate is atypical. It may > indicate that you are perhaps not leveraging abstractions in your puppet > code, or it could be that you really, truly have a large number of unique > resources. One thing I can definitely say, though, is that the higher your > resource duplication rate the faster PuppetDB will run. > > Now, regarding the max query results: I'd set that to whatever works for > you. If you're doing queries that return a huge number of results, then > feel free to bump that setting up. The only caveat is, as mentioned before, > you need to make sure you give PuppetDB enough heap to actually deal with > that size of a result set. > > Lastly, as Ken Barber indicated, we've already merged in code that > eliminates the need for that setting. We now stream resource query results > to the client on-the-fly, avoiding batching things up in memory first. This > results in much lower memory usage, and greatly reduces the time before the > client gets the first result. So...problem solved? :) > > deepak > > > if i did not mention previously, heap currently set to 1G and looking at > the spark line, i seem to be maxing out right now at about 500MB. > > > On Thu, Sep 26, 2013 at 3:33 AM, David Schmitt <[email protected]> wrote: > >> On 26.09.2013 05:17, Christopher Wood wrote: >> >>> On Wed, Sep 25, 2013 at 02:25:50PM +0100, Ken Barber wrote: >>> >>> (SNIP) >>> >>> >>> http://puppetdb1.vm:8080/**dashboard/index.html<http://puppetdb1.vm:8080/dashboard/index.html>. >>>> Since Puppet doesn't >>>> put a limit on # of resources per node, its hard to say if your case >>>> is a problem somewhere. It does however sound exceptional but not >>>> unlikely (I've seen some nodes with 10k resources a-piece for >>>> example). >>>> >>> >>> Now I'm curious about >>> >>> who these people are >>> >> >> Me, for example. >> >> >> why they need 10,000 resources per host >>> >> >> Such numbers are easy to reach when every service exports a nagios check >> into a central server. >> >> >> how they keep track of everything >>> >> >> High modularity. See below. >> >> >> how long an agent run takes >>> >> >> Ages. The biggest node I know takes around 44 minutes to run. >> >> >> and how much cpu/ram an agent run takes >>> >> >> Too much. >> >> >> and how they troubleshoot the massive debug output >>> >> >> Since these 10k+ resources are 99% the same, there is not much to >> troubleshoot. >> >> >> Regards, David >> >> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "Puppet Users" group. >> To unsubscribe from this topic, visit https://groups.google.com/d/** >> topic/puppet-users/**D1KyxpUB4UU/unsubscribe<https://groups.google.com/d/topic/puppet-users/D1KyxpUB4UU/unsubscribe> >> . >> To unsubscribe from this group and all its topics, send an email to >> puppet-users+unsubscribe@**googlegroups.com<puppet-users%[email protected]> >> . >> To post to this group, send email to [email protected]. >> Visit this group at >> http://groups.google.com/**group/puppet-users<http://groups.google.com/group/puppet-users> >> . >> For more options, visit >> https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out> >> . >> > > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/puppet-users. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- > You received this message because you are subscribed to a topic in the > Google Groups "Puppet Users" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/puppet-users/D1KyxpUB4UU/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/puppet-users. > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/puppet-users. For more options, visit https://groups.google.com/groups/opt_out.
