Re: [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

Matthew Arguin Thu, 26 Sep 2013 11:39:10 -0700

Deepak,
  thank you very much for that detail.  it certainly clears up some things
for me.  I am guessing that i might have an exec or something somewhere
(complete conjecture at this point) on a machine or two that might be
leading to my low %age on the resource duplication since my catalog %age is
at 96%...


David, completely possible that there is a bug in my manifest somewhere, my
coworker has been doing most of the work getting our puppet system set up
this way and i am working my way in to it, so i will certainly be trying to
see if there is duplication that needs not be there, however, i do know
that we have a pretty sizable number  of check (at least in my opinion for
the number of servers) for a total of 14 nodes (about 1100 active checks
and 300 passive).


On Thu, Sep 26, 2013 at 2:27 PM, Deepak Giridharagopal <
[email protected]> wrote:

>
> On Sep 26, 2013, at 8:20 AM, Matthew Arguin <[email protected]>
> wrote:
>
> So my reasoning behind the initial question/post again is due largely to
> being unfamiliar with puppetdb i would say.  We do export a lot of
> resources in our puppet deployment due to the nagios checks.  In poking
> around on the groups, i came across this post:
> https://groups.google.com/forum/#!topic/puppet-users/z1kjqwko1iA
>
> i was especially interested in the comment posted by windowsrefund at the
> bottom and trying to understand that because it seems like he is saying
> that i could reduce the amount of duplication of exported resources, but i
> am not entirely sure.
>
> Basic questions:  Is it "bad" to have resource duplication?  Is it "good"
> to have catalog duplication?  Should i just forget about the 20000 default
> on the query param or should i be aiming to tune my puppet deployment to
> work towards that?  (currently set to 50000 to stop the issue).
>
>
> A few definitions that may help (I should really add this to the FAQ!):
>
> A resource is considered "duplicated" if it exists, identically, on more
> than one system. More specifically: if a resource with the same type,
> title, parameters, and other metadata exists on more than one node in
> PuppetDB then that resource is considered one that is duplicated. So a
> resource duplication rate of, say, 40% means that 60% of your resources
> exist only on one system. I like to think of this as the "snowflake
> quotient"...it's a measurement of how many of your resources are unique and
> beautiful snowflakes.
>
> A catalog is considered "duplicated" if it's identical to the previous
> catalog that PuppetDB has stored. So if you have a node foo.com, run
> puppet on it twice, and the catalog hasn't changed for that system (you
> haven't made a config change that affects that system between runs) then
> that's considered a catalog duplicate.
>
> Internally, PuppetDB uses both of these concepts to improve performance.
> If a new catalog is exactly the same as the previously stored one for a
> node, then there's no need to use up IO to store it again. Similarly, if a
> catalog contains 90% the same resources that already exist on other nodes,
> PuppetDB doesn't need to store those resources either (rather we can just
> store pointers to already-existing data in the database).
>
> Now, are the numbers you posted good/bad? In the field, we overwhelmingly
> see resource duplication and catalog duplication in the 85-95% range. So
> I'd say that your low resource duplication rate is atypical. It may
> indicate that you are perhaps not leveraging abstractions in your puppet
> code, or it could be that you really, truly have a large number of unique
> resources. One thing I can definitely say, though, is that the higher your
> resource duplication rate the faster PuppetDB will run.
>
> Now, regarding the max query results: I'd set that to whatever works for
> you. If you're doing queries that return a huge number of results, then
> feel free to bump that setting up. The only caveat is, as mentioned before,
> you need to make sure you give PuppetDB enough heap to actually deal with
> that size of a result set.
>
> Lastly, as Ken Barber indicated, we've already merged in code that
> eliminates the need for that setting. We now stream resource query results
> to the client on-the-fly, avoiding batching things up in memory first. This
> results in much lower memory usage, and greatly reduces the time before the
> client gets the first result. So...problem solved? :)
>
> deepak
>
>
> if i did not mention previously, heap currently set to 1G and looking at
> the spark line, i seem to be maxing out right now at about 500MB.
>
>
> On Thu, Sep 26, 2013 at 3:33 AM, David Schmitt <[email protected]> wrote:
>
>> On 26.09.2013 05:17, Christopher Wood wrote:
>>
>>> On Wed, Sep 25, 2013 at 02:25:50PM +0100, Ken Barber wrote:
>>>
>>> (SNIP)
>>>
>>>  
>>> http://puppetdb1.vm:8080/**dashboard/index.html<http://puppetdb1.vm:8080/dashboard/index.html>.
>>>> Since Puppet doesn't
>>>> put a limit on # of resources per node, its hard to say if your case
>>>> is a problem somewhere. It does however sound exceptional but not
>>>> unlikely (I've seen some nodes with 10k resources a-piece for
>>>> example).
>>>>
>>>
>>> Now I'm curious about
>>>
>>> who these people are
>>>
>>
>> Me, for example.
>>
>>
>>  why they need 10,000 resources per host
>>>
>>
>> Such numbers are easy to reach when every service exports a nagios check
>> into a central server.
>>
>>
>>  how they keep track of everything
>>>
>>
>> High modularity. See below.
>>
>>
>>  how long an agent run takes
>>>
>>
>> Ages. The biggest node I know takes around 44 minutes to run.
>>
>>
>>  and how much cpu/ram an agent run takes
>>>
>>
>> Too much.
>>
>>
>>  and how they troubleshoot the massive debug output
>>>
>>
>> Since these 10k+ resources are 99% the same, there is not much to
>> troubleshoot.
>>
>>
>> Regards, David
>>
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "Puppet Users" group.
>> To unsubscribe from this topic, visit https://groups.google.com/d/**
>> topic/puppet-users/**D1KyxpUB4UU/unsubscribe<https://groups.google.com/d/topic/puppet-users/D1KyxpUB4UU/unsubscribe>
>> .
>> To unsubscribe from this group and all its topics, send an email to
>> puppet-users+unsubscribe@**googlegroups.com<puppet-users%[email protected]>
>> .
>> To post to this group, send email to [email protected].
>> Visit this group at 
>> http://groups.google.com/**group/puppet-users<http://groups.google.com/group/puppet-users>
>> .
>> For more options, visit 
>> https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
>> .
>>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
>
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/puppet-users.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "Puppet Users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/puppet-users/D1KyxpUB4UU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/puppet-users.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/puppet-users.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

Reply via email to