I currently have two different MQ's running, one activeMQ running on a
node and a rabbitMQ on a different one.
 If I switch my mcollective servers from one broker to another and do
a 'mco ping' while the nodes are registering with the other broker I
hit a threshold somewhere.

Scenario:

I start a 'mco ping' from 2 clients simultaneously. Each client
registered to a different broker.   Both can successfully do a ping to
all  nodes.

client 1 -

199.49 ms
238.21 ms
275.27 ms
312.39 ms
350.79 ms
387.84 ms
[...]
2624.43 ms
2993.36 ms

---- ping statistics ----
50 replies max: 2993.36 min: 199.49 avg: 1030.95

client 2 -

469.72 ms
492.41 ms
516.31 ms
541.34 ms
638.03 ms
664.93 ms
[...]
4254.50 ms
4276.60 ms

---- ping statistics ----
160 replies max: 4276.60 min: 469.72 avg: 2414.46

The command does complete and ping times start way lower. I found some
nodes with <512 Mb ram.  Could it be possible that these are causing
problem because they are swapping and do not respond quick enough and
somehow that reply gets lost in translation causing the mechanism to
break?

If I do a 'mco rpc service status service=ssh' on both clients

client 1 -

Determining the amount of hosts matching filter for 2 seconds .... 50

 * [ ============================================================> ] 50 / 50


client 2 -

Determining the amount of hosts matching filter for 2 seconds ....
warn: Could not decrypt message from client: #<Class:0x7f16ac7ba268>:
execution expired
warn: Ignoring a message that did not pass security validations

I'm adjusting my manifest to ensure the mcollective service is not
running on nodes < 512 ram.

2012/6/8 R.I.Pienaar <r...@devco.net>:
>
>
> ----- Original Message -----
>> From: "Martin Willemsma" <mwillem...@gmail.com>
>> To: puppet-users@googlegroups.com
>> Sent: Friday, June 8, 2012 7:11:39 AM
>> Subject: Re: [Puppet Users] MCollective not all nodes answer to commands 
>> when using aes_security plugin
>>
>> Thanks for your response.
>> One thing I noticed when using the PSK is that I indeed see the
>> discovery with the progressbar. That's one thing I never see when
>> using AES. Commands always come back from discovered nodes when using
>> PSK.
>>
>> You suggest SSL TLS, is that the same as AES provider i'm using right
>> now?
>
> identity of the client is securely established and the payload is encrypted
> using industry standards, i guess it depends on your needs though
>
>>
>> I run the client and on the same node as the rabbitmq.  Also tried
>> with an activemq installation on another node in the same subnet.
>> Doesn't seem to make any difference. The node running the messagebus
>> is mostly idle. 4 CPUs / 4Gb ram and on the other node 2 CPU / 2 Gb
>> ram. I also tried this the client on my workstation. I5 / 8 Gb ram /
>> SSD disk, same behavior.
>>
>> I agree that the ping times are pretty high but I could live with
>> that if at least all the replies came back.
>
> ping times that long will just prevent everything from working. there's
> a fundamental problem somewhere.
>
>> I spend quite some time making this work on our platform. I need to
>> look more in-depth on the network part.
>>
>> 2012/6/7 R.I.Pienaar <r...@devco.net>:
>> >
>> >
>> > ----- Original Message -----
>> >> From: "Martin Willemsma" <mwillem...@gmail.com>
>> >> To: "Puppet Users" <puppet-users@googlegroups.com>
>> >> Sent: Thursday, June 7, 2012 7:11:41 AM
>> >> Subject: [Puppet Users] MCollective not all nodes answer to
>> >> commands when using aes_security plugin
>> >>
>> >> Hi,
>> >>
>> >> I deployed MCollective to our Puppet clients. approx. ~ 200. Our
>> >> platform requires the most secure setup possible, so PSK as
>> >> securityprovider is not an option.
>> >
>> > I'd almost always suggest SSL TLS + the ssl plugin now.
>> >
>> >> Therefor I changed the security provider to aes_security reusing
>> >> puppets certificates in the server.cfg as found  in the docs (1)
>> >> Our goal is to use mcollective to offload event-driven actions to
>> >> agents running on designated nodes from a webapplication.
>> >>
>> >> e.g: send out a message to the 'platform' collective to create a
>> >> DNS
>> >> record. This message should be processed by a node that runs the
>> >> 'DNS' agent.
>> >>
>> >> One thing I noticed after switching to the aes_security plugin is
>> >> the
>> >> ping latency went up and a reply to an action does not come back
>> >> from
>> >> all the nodes. Were does this latency come from?
>> >> If I do a mco ping on the client I expect:
>> >>
>> >> - every node to respond
>> >> - show me the  ---- ping statistics ---- in the end
>> >> - jump back to my console ready for the next command
>> >>
>> >> but it does not. Instead it shows me the output for 207 nodes and
>> >> then
>> >> it just "HANGS" there.
>> >
>> >>
>> >> This output shows pingtimes hostnames omitted
>> >>
>> >> 1340.38 ms <- first reply
>> >> 1406.25 ms
>> >> 1456.71 ms
>> >> 1508.19 ms
>> >> 1550.52 ms
>> >> 1576.07 ms
>> >> 1601.15 ms
>> >> 1627.40 ms
>> >> 1653.23 ms
>> >> 1678.26 ms
>> >> [ .. omitted intentionally ]
>> >> 7518.66 ms
>> >> 7556.47 ms
>> >> 7593.06 ms
>> >> 7623.46 ms
>> >> 7648.64 ms
>> >> 7685.62 ms
>> >> 7722.84 ms <- last reply I see on the client console
>> >
>> >
>> > There are a few odd things here, the first reply is way too slow,
>> > the
>> > AES plugin is computationally very heavy and not suited for large
>> > deploys yours though is not large and even then the overhead is in
>> > the
>> > 30 to 40ms over that of the PSK plugin on first response - the
>> > effect
>> > snow balls but this should not be the performance I expect.
>> >
>> > Second the 'mco ping' should not run indefinitely till you stop it,
>> > it
>> > should run for 5 seconds and then end, does yours do that with the
>> > PSK
>> > plugin active?
>> >
>> > Hard to guess what might be the underlying cause for the above
>> > combination of issues - could be a very slow machine as the mco
>> > client,
>> > could be issues on the network perhaps there are a lot of TCP
>> > rebroadcasts or something along those lines.
>> >
>> > On the machines that do not respond do you see anything in their
>> > logs -
>> > put them in debug and make sure they got the request and replied.
>> > Anything weird on your broker?  Large CPU usage perhaps?
>> >
>> >
>> >>
>> >>
>> >> If I check the the logfile on the client sending the command
>> >> '/var/log/mcollective.log' the last few lines show me:
>> >>
>> >> D, [2012-06-07T07:39:46.470905 #15910] DEBUG -- :
>> >> pluginmanager.rb:83:in `[]' Returning cached plugin
>> >> security_plugin
>> >> with class MCollective::Security::Aes_security
>> >> D, [2012-06-07T07:39:46.471029 #15910] DEBUG -- :
>> >> aes_security.rb:202:in `deserialize' De-Serializing using marshal
>> >> D, [2012-06-07T07:39:46.471121 #15910] DEBUG -- :
>> >> aes_security.rb:255:in `decrypt' Decrypting message using private
>> >> key
>> >> D, [2012-06-07T07:39:46.495265 #15910] DEBUG -- :
>> >> aes_security.rb:202:in `deserialize' De-Serializing using marshal
>> >> D, [2012-06-07T07:39:46.495711 #15910] DEBUG -- : stomp.rb:191:in
>> >> `receive' Waiting for a message from Stomp
>> >>
>> >> I can wait forever but it does not receive
>> >> I use (control + break) to exit out
>> >>
>> >> ^C
>> >>
>> >> ---- ping statistics ----
>> >> 207 replies max: 6877.20 min: 616.98 avg: 3912.99
>> >>
>> >> Logfile shows me:
>> >>
>> >> D, [2012-06-07T07:41:10.571316 #15910] DEBUG -- : client.rb:72:in
>> >> `unsubscribe' Unsubscribing reply target for discovery
>> >> D, [2012-06-07T07:41:10.571496 #15910] DEBUG -- :
>> >> pluginmanager.rb:83:in `[]' Returning cached plugin
>> >> connector_plugin
>> >> with class MCollective::Connector::Stomp
>> >> D, [2012-06-07T07:41:10.571615 #15910] DEBUG -- : stomp.rb:257:in
>> >> `unsubscribe' Unsubscribing from
>> >> /topic/mcollective.discovery.reply
>> >> D, [2012-06-07T07:41:10.572767 #15910] DEBUG -- :
>> >> pluginmanager.rb:83:in `[]' Returning cached plugin
>> >> connector_plugin
>> >> with class MCollective::Connector::Stomp
>> >> D, [2012-06-07T07:41:10.572849 #15910] DEBUG -- : stomp.rb:264:in
>> >> `disconnect' Disconnecting from Stomp
>> >>
>> >> Same behavior with using any of the other commands 'get_fact' ,
>> >> 'rpc
>> >> package' 'rpc service'. I'm just not able to do a search over the
>> >> collective when using the AES plugin.
>> >>
>> >> If I switch switch back to PSK replies are speedy and always come
>> >> back. But then again this is not want.
>> >>
>> >> At first I was using RabbitMQ default config. I tries some
>> >> tweaking
>> >> but did not seem to make any difference to the behaviour of mco. I
>> >> switched to ActiveMQ 5.6 with the configfiles from puppetlabs.git.
>> >> Set
>> >> it up according to the docs , again played with some setttings and
>> >> did
>> >> not do anything at all.
>> >>
>> >> tcpdumps show the node running the mcollective server responds to
>> >> the
>> >> message send from the mcollective client. But seconds after the
>> >> node
>> >> replies the output gets printed on the client. Somehow it looks
>> >> like
>> >> the message gets 'STUCK' in the messagebus and arrives late on the
>> >> client.
>> >>
>> >> Any hints on were to tackle this issue are more then welcome and
>> >> really appreciated . This issue is blocking the implementation of
>> >> mcollective on our platform which is more than just sad
>> >>
>> >> Currently I'm using MCollective 2.0.0 on Ubuntu 10.04 LTS X86_64.
>> >>
>> >> (1)
>> >> http://docs.puppetlabs.com/mcollective/reference/plugins/security_aes.html
>> >>
>> >> ---
>> >> Best regards,
>> >>
>> >> Martin
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google
>> >> Groups "Puppet Users" group.
>> >> To post to this group, send email to
>> >> puppet-users@googlegroups.com.
>> >> To unsubscribe from this group, send email to
>> >> puppet-users+unsubscr...@googlegroups.com.
>> >> For more options, visit this group at
>> >> http://groups.google.com/group/puppet-users?hl=en.
>> >>
>> >>
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups "Puppet Users" group.
>> > To post to this group, send email to puppet-users@googlegroups.com.
>> > To unsubscribe from this group, send email to
>> > puppet-users+unsubscr...@googlegroups.com.
>> > For more options, visit this group at
>> > http://groups.google.com/group/puppet-users?hl=en.
>> >
>>
>>
>>
>> --
>> ---
>> Met vriendelijke groet,
>>
>> Martin Willemsma
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Puppet Users" group.
>> To post to this group, send email to puppet-users@googlegroups.com.
>> To unsubscribe from this group, send email to
>> puppet-users+unsubscr...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/puppet-users?hl=en.
>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Puppet Users" group.
> To post to this group, send email to puppet-users@googlegroups.com.
> To unsubscribe from this group, send email to 
> puppet-users+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/puppet-users?hl=en.
>



-- 
---
Met vriendelijke groet,

Martin Willemsma

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.

Reply via email to