Re: [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

David Schmitt Fri, 27 Sep 2013 05:13:08 -0700

On 26.09.2013 20:38, Matthew Arguin wrote:

David, completely possible that there is a bug in my manifest somewhere,
my coworker has been doing most of the work getting our puppet system
set up this way and i am working my way in to it, so i will certainly be
trying to see if there is duplication that needs not be there, however,
i do know that we have a pretty sizable number  of check (at least in my
opinion for the number of servers) for a total of 14 nodes (about 1100
active checks and 300 passive).

so that should result in the order of 2800 resources for the checks(1100+300, once for the actual node and once for the monitoring host).

That would indicate that you are storing 8-10 resources per check, whichseems to indicate a certain "potential" for "optimisation".



Regards, David



On Thu, Sep 26, 2013 at 2:27 PM, Deepak Giridharagopal
<[email protected] <mailto:[email protected]>> wrote:


    On Sep 26, 2013, at 8:20 AM, Matthew Arguin <[email protected]
    <mailto:[email protected]>> wrote:

    So my reasoning behind the initial question/post again is due
    largely to being unfamiliar with puppetdb i would say.  We do
    export a lot of resources in our puppet deployment due to the
    nagios checks.  In poking around on the groups, i came across this
    post: https://groups.google.com/forum/#!topic/puppet-users/z1kjqwko1iA

    i was especially interested in the comment posted by windowsrefund
    at the bottom and trying to understand that because it seems like
    he is saying that i could reduce the amount of duplication of
    exported resources, but i am not entirely sure.

    Basic questions:  Is it "bad" to have resource duplication?  Is it
    "good" to have catalog duplication?  Should i just forget about
    the 20000 default on the query param or should i be aiming to tune
    my puppet deployment to work towards that?  (currently set to
    50000 to stop the issue).


    A few definitions that may help (I should really add this to the FAQ!):

    A resource is considered "duplicated" if it exists, identically, on
    more than one system. More specifically: if a resource with the same
    type, title, parameters, and other metadata exists on more than one
    node in PuppetDB then that resource is considered one that is
    duplicated. So a resource duplication rate of, say, 40% means that
    60% of your resources exist only on one system. I like to think of
    this as the "snowflake quotient"...it's a measurement of how many of
    your resources are unique and beautiful snowflakes.

    A catalog is considered "duplicated" if it's identical to the
    previous catalog that PuppetDB has stored. So if you have a node
    foo.com <http://foo.com>, run puppet on it twice, and the catalog
    hasn't changed for that system (you haven't made a config change
    that affects that system between runs) then that's considered a
    catalog duplicate.

    Internally, PuppetDB uses both of these concepts to improve
    performance. If a new catalog is exactly the same as the previously
    stored one for a node, then there's no need to use up IO to store it
    again. Similarly, if a catalog contains 90% the same resources that
    already exist on other nodes, PuppetDB doesn't need to store those
    resources either (rather we can just store pointers to
    already-existing data in the database).

    Now, are the numbers you posted good/bad? In the field, we
    overwhelmingly see resource duplication and catalog duplication in
    the 85-95% range. So I'd say that your low resource duplication rate
    is atypical. It may indicate that you are perhaps not leveraging
    abstractions in your puppet code, or it could be that you really,
    truly have a large number of unique resources. One thing I can
    definitely say, though, is that the higher your resource duplication
    rate the faster PuppetDB will run.

    Now, regarding the max query results: I'd set that to whatever works
    for you. If you're doing queries that return a huge number of
    results, then feel free to bump that setting up. The only caveat is,
    as mentioned before, you need to make sure you give PuppetDB enough
    heap to actually deal with that size of a result set.

    Lastly, as Ken Barber indicated, we've already merged in code that
    eliminates the need for that setting. We now stream resource query
    results to the client on-the-fly, avoiding batching things up in
    memory first. This results in much lower memory usage, and greatly
    reduces the time before the client gets the first result.
    So...problem solved? :)

    deepak


    if i did not mention previously, heap currently set to 1G and
    looking at the spark line, i seem to be maxing out right now at
    about 500MB.


    On Thu, Sep 26, 2013 at 3:33 AM, David Schmitt <[email protected]
    <mailto:[email protected]>> wrote:

        On 26.09.2013 05 <tel:26.09.2013%2005>:17, Christopher Wood wrote:

            On Wed, Sep 25, 2013 at 02:25:50PM +0100, Ken Barber wrote:

            (SNIP)

                http://puppetdb1.vm:8080/__dashboard/index.html
                <http://puppetdb1.vm:8080/dashboard/index.html>. Since
                Puppet doesn't
                put a limit on # of resources per node, its hard to
                say if your case
                is a problem somewhere. It does however sound
                exceptional but not
                unlikely (I've seen some nodes with 10k resources
                a-piece for
                example).


            Now I'm curious about

            who these people are


        Me, for example.


            why they need 10,000 resources per host


        Such numbers are easy to reach when every service exports a
        nagios check into a central server.


            how they keep track of everything


        High modularity. See below.


            how long an agent run takes


        Ages. The biggest node I know takes around 44 minutes to run.


            and how much cpu/ram an agent run takes


        Too much.


            and how they troubleshoot the massive debug output


        Since these 10k+ resources are 99% the same, there is not much
        to troubleshoot.


        Regards, David


        --
        You received this message because you are subscribed to a
        topic in the Google Groups "Puppet Users" group.
        To unsubscribe from this topic, visit
        
https://groups.google.com/d/__topic/puppet-users/__D1KyxpUB4UU/unsubscribe
        
<https://groups.google.com/d/topic/puppet-users/D1KyxpUB4UU/unsubscribe>.
        To unsubscribe from this group and all its topics, send an
        email to puppet-users+unsubscribe@__googlegroups.com
        <mailto:puppet-users%[email protected]>.
        To post to this group, send email to
        [email protected]
        <mailto:[email protected]>.
        Visit this group at
        http://groups.google.com/__group/puppet-users
        <http://groups.google.com/group/puppet-users>.
        For more options, visit
        https://groups.google.com/__groups/opt_out
        <https://groups.google.com/groups/opt_out>.



    --
    You received this message because you are subscribed to the Google
    Groups "Puppet Users" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected]
    <mailto:[email protected]>.

    To post to this group, send email to [email protected]
    <mailto:[email protected]>.
    Visit this group at http://groups.google.com/group/puppet-users.
    For more options, visit https://groups.google.com/groups/opt_out.


    --
    You received this message because you are subscribed to a topic in
    the Google Groups "Puppet Users" group.
    To unsubscribe from this topic, visit
    https://groups.google.com/d/topic/puppet-users/D1KyxpUB4UU/unsubscribe.
    To unsubscribe from this group and all its topics, send an email to
    [email protected]
    <mailto:puppet-users%[email protected]>.
    To post to this group, send email to [email protected]
    <mailto:[email protected]>.
    Visit this group at http://groups.google.com/group/puppet-users.
    For more options, visit https://groups.google.com/groups/opt_out.


--
You received this message because you are subscribed to the Google
Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/puppet-users.
For more options, visit https://groups.google.com/groups/opt_out.


--
You received this message because you are subscribed to the Google Groups "Puppet 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/puppet-users.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

Reply via email to