[Puppet-dev] Re: Puppet environments implementation question - get_conf() location and result caching

2015-04-17 Thread Henrik Lindberg

On 2015-16-04 18:34, Bostjan Skufca wrote:



On Thursday, 16 April 2015 14:01:29 UTC+2, henrik lindberg wrote:

On 2015-16-04 4:30, Bostjan Skufca wrote:
  Just recently I was looking at the environments part of puppet
  implementation, and I have stumbled upon this curious gimmick.
 
  There is Puppet::Environments class, which is a class for general
  Environments management (searching, loaders, caching, etc).
  Then there is Puppet::Node::Environment which is instantiated for
every
  environent found. Logical up to this point.
 
  But, contrary to all OOP notions, get_conf() function is not a
method of
  Puppet::Node::Environment class.
  Rather, it is a lookup function in Puppet::Environments class which
  takes environment name as an argument and fetches
  Puppet::Settings::EnvironmentConf instance.
 
  Puppet::Environments has two function for environment and
environment
  data retrieval:
  - get() fetches Puppet::Node::Environment instance
  - get_conf() fetches Puppet::Settings::EnvironmentConf instance
 
  The curious thing is:
  - get() method operates with cache
  - get_conf() does not include any caching
 
  I tested this on a catalog of 1633 resources, and:
  - get() was called around 220 times
  - get_conf() was called around 25.000 times
 
  These are the comments above get_conf() in environments.rb:
   # This implementation evicts the cache, and always gets the
current
   # configuration of the environment
   #
   # TODO: While this is wasteful since it
   # needs to go on a search for the conf, it is too disruptive to
  optimize
   # this.
 
  My questions are:
  - why does get_conf() method belong to Environments instead of
  Node::Environment class?
  - why are get_conf() results not cached? Why is it too disruptive?
 

It should be fine to cache the env conf in an instance of an
environment, and it should be possible to obtain the config given
an instantiated environment. This is safe since the conf will then be
evicted when the environment is evicted (and while the environment is
cached it will not reload or change).


It would be more logical that way too, for the user. If there is
environment timeout, user would expect that all parts of environment
(env configuration, code, even manifests) are cached. At least that was
my understanding until I actually looked at the code.

Shall I create a Jira ticket and pull request with working code change?

Yes please. A ticket for fix-version PUP 4.x (we then update it with the 
actual release it will go into). I don't think it is worth the trouble 
of fixing this on 3.x. Target the code to the master branch.



I believe that the large number of calls
to get the conf could almost exclusively be for an already loaded
environment (I should be measured though).


Yes, this is true. All the calls are for the environment where the
node/agent resides. I tested this with single node only, though.


Good to know. Then this is worth doing.


All other calls would be for information purposes and those cannot be
cached unless the environment.conf file is watched (which is difficult
since it may not exist, come into existence, be removed etc.). For
those
calls, the cost of checking if the file is up to date is almost as
expensive as loading the conf.


All other calls - do you mean calls for other environments?
If so, as said above, I did not notice this on my setup (could not). But
I believe that if env caching follows the same pattern for all
environments, it should not be a problem.

I do mean calls for other environments than the one(s) currently loaded 
and in use/cached by the master. This happens for instance when 
commands/services query for available environments, asks which classes 
that are in an environment etc. These calls when made for environments 
that are loaded, should read the conf directly (they are almost never 
interested in loading the environment they want information about).




Since the 3.x implementation is very complex due to the dynamic
environments, I suggest that this should be fixed for 4.x.


This is probably for the best, true. If one wants to refrain from using
dynamic environments and wishes to use 3.7.x, a separate patch may be
provided.


Cheers, and thanks for taking a stab at improving this!

- henrik

--

Visit my Blog Puppet on the Edge
http://puppet-on-the-edge.blogspot.se/

--
You received this message because you are subscribed to the Google Groups Puppet 
Developers group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/mgr0bs%24b1c%241%40ger.gmane.org.
For more options, visit 

[Puppet-dev] Re: Puppet environments implementation question - get_conf() location and result caching

2015-04-16 Thread Henrik Lindberg

On 2015-16-04 4:30, Bostjan Skufca wrote:

Just recently I was looking at the environments part of puppet
implementation, and I have stumbled upon this curious gimmick.

There is Puppet::Environments class, which is a class for general
Environments management (searching, loaders, caching, etc).
Then there is Puppet::Node::Environment which is instantiated for every
environent found. Logical up to this point.

But, contrary to all OOP notions, get_conf() function is not a method of
Puppet::Node::Environment class.
Rather, it is a lookup function in Puppet::Environments class which
takes environment name as an argument and fetches
Puppet::Settings::EnvironmentConf instance.

Puppet::Environments has two function for environment and environment
data retrieval:
- get() fetches Puppet::Node::Environment instance
- get_conf() fetches Puppet::Settings::EnvironmentConf instance

The curious thing is:
- get() method operates with cache
- get_conf() does not include any caching

I tested this on a catalog of 1633 resources, and:
- get() was called around 220 times
- get_conf() was called around 25.000 times

These are the comments above get_conf() in environments.rb:
 # This implementation evicts the cache, and always gets the current
 # configuration of the environment
 #
 # TODO: While this is wasteful since it
 # needs to go on a search for the conf, it is too disruptive to
optimize
 # this.

My questions are:
- why does get_conf() method belong to Environments instead of
Node::Environment class?
- why are get_conf() results not cached? Why is it too disruptive?



The path to the new environments implementation (supporting both 
directory environments, and the now deprecated (and in 4.0 removed) 
dynamic environments) was paved with problems as it affected more or

less the entire code base.

It should be fine to cache the env conf in an instance of an 
environment, and it should be possible to obtain the config given
an instantiated environment. This is safe since the conf will then be 
evicted when the environment is evicted (and while the environment is 
cached it will not reload or change).


I believe that the large number of calls
to get the conf could almost exclusively be for an already loaded 
environment (I should be measured though).
All other calls would be for information purposes and those cannot be 
cached unless the environment.conf file is watched (which is difficult 
since it may not exist, come into existence, be removed etc.). For those 
calls, the cost of checking if the file is up to date is almost as 
expensive as loading the conf.


Since the 3.x implementation is very complex due to the dynamic 
environments, I suggest that this should be fixed for 4.x.


- henrik


Thanks for answers,
b.

--
You received this message because you are subscribed to the Google
Groups Puppet Developers group.
To unsubscribe from this group and stop receiving emails from it, send
an email to puppet-dev+unsubscr...@googlegroups.com
mailto:puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/puppet-dev/53b560d5-1c8f-4b00-944c-3156343a175f%40googlegroups.com
https://groups.google.com/d/msgid/puppet-dev/53b560d5-1c8f-4b00-944c-3156343a175f%40googlegroups.com?utm_medium=emailutm_source=footer.
For more options, visit https://groups.google.com/d/optout.



--

Visit my Blog Puppet on the Edge
http://puppet-on-the-edge.blogspot.se/

--
You received this message because you are subscribed to the Google Groups Puppet 
Developers group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/mgo8ac%24hc5%241%40ger.gmane.org.
For more options, visit https://groups.google.com/d/optout.


[Puppet-dev] Re: Puppet environments implementation question - get_conf() location and result caching

2015-04-16 Thread Bostjan Skufca


On Thursday, 16 April 2015 14:01:29 UTC+2, henrik lindberg wrote:

 On 2015-16-04 4:30, Bostjan Skufca wrote: 
  Just recently I was looking at the environments part of puppet 
  implementation, and I have stumbled upon this curious gimmick. 
  
  There is Puppet::Environments class, which is a class for general 
  Environments management (searching, loaders, caching, etc). 
  Then there is Puppet::Node::Environment which is instantiated for every 
  environent found. Logical up to this point. 
  
  But, contrary to all OOP notions, get_conf() function is not a method of 
  Puppet::Node::Environment class. 
  Rather, it is a lookup function in Puppet::Environments class which 
  takes environment name as an argument and fetches 
  Puppet::Settings::EnvironmentConf instance. 
  
  Puppet::Environments has two function for environment and environment 
  data retrieval: 
  - get() fetches Puppet::Node::Environment instance 
  - get_conf() fetches Puppet::Settings::EnvironmentConf instance 
  
  The curious thing is: 
  - get() method operates with cache 
  - get_conf() does not include any caching 
  
  I tested this on a catalog of 1633 resources, and: 
  - get() was called around 220 times 
  - get_conf() was called around 25.000 times 
  
  These are the comments above get_conf() in environments.rb: 
   # This implementation evicts the cache, and always gets the current 
   # configuration of the environment 
   # 
   # TODO: While this is wasteful since it 
   # needs to go on a search for the conf, it is too disruptive to 
  optimize 
   # this. 
  
  My questions are: 
  - why does get_conf() method belong to Environments instead of 
  Node::Environment class? 
  - why are get_conf() results not cached? Why is it too disruptive? 
  

 It should be fine to cache the env conf in an instance of an 
 environment, and it should be possible to obtain the config given 
 an instantiated environment. This is safe since the conf will then be 
 evicted when the environment is evicted (and while the environment is 
 cached it will not reload or change).


It would be more logical that way too, for the user. If there is 
environment timeout, user would expect that all parts of environment (env 
configuration, code, even manifests) are cached. At least that was my 
understanding until I actually looked at the code.

Shall I create a Jira ticket and pull request with working code change?

 

 I believe that the large number of calls 
 to get the conf could almost exclusively be for an already loaded 
 environment (I should be measured though). 


Yes, this is true. All the calls are for the environment where the 
node/agent resides. I tested this with single node only, though.

 

 All other calls would be for information purposes and those cannot be 
 cached unless the environment.conf file is watched (which is difficult 
 since it may not exist, come into existence, be removed etc.). For those 
 calls, the cost of checking if the file is up to date is almost as 
 expensive as loading the conf. 


All other calls - do you mean calls for other environments?
If so, as said above, I did not notice this on my setup (could not). But I 
believe that if env caching follows the same pattern for all environments, 
it should not be a problem.


Since the 3.x implementation is very complex due to the dynamic 
 environments, I suggest that this should be fixed for 4.x. 


This is probably for the best, true. If one wants to refrain from using 
dynamic environments and wishes to use 3.7.x, a separate patch may be 
provided.


b.

-- 
You received this message because you are subscribed to the Google Groups 
Puppet Developers group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/57e71df8-0a83-4cd7-a851-6f2fc2e0954c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.