Re: [ansible-devel] Ansilbe read all variable even they do not need for task (perfomance issue with lookup_plugins)

2016-09-12 Thread Ben Roubicek
Bug report here: https://github.com/ansible/ansible/issues/17024

On Monday, September 12, 2016 at 8:55:20 AM UTC-7, Ben Roubicek wrote:
>
> Tobias, I must have jumped the gun on Friday, apologies. Looks like its 
> actually a different commit causing the issue.  Sorry :-)
>
> Culprit commit:  
> https://github.com/ansible/ansible/commit/6b286ee0c88f3252a27ff94f39d794566c3f7d6f#diff-76ffc0551f8cf3d6255500316568e60bR412
>
> My scenario is that I have a inventory/foo/group_vars/all/hashivault.yml 
> file that specifies 100+ secrets that are to be retrieved via a custom 
> lookup plugin.  Before this commit, evaluation of the variables would only 
> occur when they are used.  With this commit applied, each task is 
> re-evaluating the variables causing thousands of calls during the play.
>
> My lookup calls look like:
> db_password: "{{ lookup('vault', 'secret/db_password', 'value') }}"
>
>
> Anyway, tinkering with the code a bit.
> in lib/ansible/executor/task_executor.py
> By simply changing 
> self._connection.set_host_overrides(host=self._host, hostvars=variables.
> get('hostvars', {}).get(self._host.name, {}))
>
> back to: 
> self._connection.set_host_overrides(host=self._host)
>
> Reduces the calls back to only those required by the task.
>
> I dont believe I understand the code well enough to make a PR, so I would 
> appreciate some advice on how best to proceed with a fix (assuming this was 
> an unintended side-effect of this commit).
>
> Thanks
> -Ben
>
> On Saturday, September 10, 2016 at 1:51:23 AM UTC-7, Tobias Wolf wrote:
>>
>> Can you point out how you think that particular commit affects how often 
>> lookups are evaluated? I don't see it.
>>
>> All that is doing is to make a os.listdir (essentially an ls) in the 
>> host_vars and group_vars directories and then checking group names and host 
>> names against this listing instead of checking on the file system causing a 
>> storm of lstats.
>>
>> Maybe you were mistaken by the word "lookup"? Lookup refers to `stat` 
>> here.
>>
>> regards,
>> Tobias (author of that commit)
>>
>> On Saturday, September 10, 2016 at 2:31:56 AM UTC+2, Ben Roubicek wrote:
>>>
>>> Also, I found the commit that changed this behavior.
>>>
>>> https://github.com/ansible/ansible/commit/7287effb5ce241ce645d61e55e981edc73fa382a
>>>
>>>
>>>
>>> On Friday, September 9, 2016 at 5:01:23 PM UTC-7, Ben Roubicek wrote:

 Following up here. The behavior of lookup plugin evaluation has 
 definitely changed post 2.1.0.0.  In 2.1.0.0, lookup calls seem to be lazy 
 evaluated so if you perform an HTTP lookup to Hashicorp Vault, for 
 example, 
 that call is only made when the variable is used in the play.  Post 
 2.1.0.0, all variables seem to be evaluated for each task!  Thats a huge 
 change of behavior causing considerable performance problems for us since 
 we relay on gathering secrets from an external source specified at the 
 inventory level.

 Is there an explanation for why that change was made?  Can it be 
 changed back to the previous behavior?

 On Thursday, August 25, 2016 at 12:05:18 AM UTC-7, jhawkesworth wrote:
>
> I wonder if you can use uri delegated to your localhost to collect 
> your facts, and use set_fact to make them available to the play.
>
> Also I wonder if a custom setup module might be able to do what you 
> need? If you can get the vars you need into facts you can use one of the 
> existing fact caching mechanisms 
> http://docs.ansible.com/ansible/playbooks_variables.html#fact-caching
>
> Jon
>
> On Tuesday, August 23, 2016 at 2:54:36 PM UTC+1, Artyom Aleksandrov 
> wrote:
>>
>> Hello,
>>
>>
>> Thank you for answer. It's bad news because all variables which *can 
>> be* used by host  reads  on each task. 
>> Variables in *all* group of course provide most reads but problem 
>> also applies to other groups and host_vars. =(
>>  
>> > Can you give a different example of that being in a group the 
>> current host is not a member of and yet still being referenced?
>>
>> No. There is no problem with this case.
>>
>> Have you thought about changing this behavior or implementing runtime 
>> cache for variables?
>>
>>
>> On Wednesday, August 17, 2016 at 6:05:29 PM UTC+3, James Cammarata 
>> wrote:
>>>
>>> Sorry for the delayed response, but in the case of having it in the 
>>> `all` group, that var would be referenced on every task due to the fact 
>>> that it's being brought in for all hosts.
>>>
>>> Can you give a different example of that being in a group the 
>>> current host is not a member of and yet still being referenced?
>>>
>>> James Cammarata
>>>
>>> Ansible Lead/Sr. Principal Software Engineer
>>> Ansible by Red Hat
>>> twitter: @thejimic, github: jimi-c
>>>
>>> On Fri, Aug 5, 2016 at 4:07 PM, 

Re: [ansible-devel] Ansilbe read all variable even they do not need for task (perfomance issue with lookup_plugins)

2016-09-12 Thread Ben Roubicek
Tobias, I must have jumped the gun on Friday, apologies. Looks like its 
actually a different commit causing the issue.  Sorry :-)

Culprit commit: 
 
https://github.com/ansible/ansible/commit/6b286ee0c88f3252a27ff94f39d794566c3f7d6f#diff-76ffc0551f8cf3d6255500316568e60bR412

My scenario is that I have a inventory/foo/group_vars/all/hashivault.yml 
file that specifies 100+ secrets that are to be retrieved via a custom 
lookup plugin.  Before this commit, evaluation of the variables would only 
occur when they are used.  With this commit applied, each task is 
re-evaluating the variables causing thousands of calls during the play.

My lookup calls look like:
db_password: "{{ lookup('vault', 'secret/db_password', 'value') }}"


Anyway, tinkering with the code a bit.
in lib/ansible/executor/task_executor.py
By simply changing 
self._connection.set_host_overrides(host=self._host, hostvars=variables.get(
'hostvars', {}).get(self._host.name, {}))

back to: 
self._connection.set_host_overrides(host=self._host)

Reduces the calls back to only those required by the task.

I dont believe I understand the code well enough to make a PR, so I would 
appreciate some advice on how best to proceed with a fix (assuming this was 
an unintended side-effect of this commit).

Thanks
-Ben

On Saturday, September 10, 2016 at 1:51:23 AM UTC-7, Tobias Wolf wrote:
>
> Can you point out how you think that particular commit affects how often 
> lookups are evaluated? I don't see it.
>
> All that is doing is to make a os.listdir (essentially an ls) in the 
> host_vars and group_vars directories and then checking group names and host 
> names against this listing instead of checking on the file system causing a 
> storm of lstats.
>
> Maybe you were mistaken by the word "lookup"? Lookup refers to `stat` here.
>
> regards,
> Tobias (author of that commit)
>
> On Saturday, September 10, 2016 at 2:31:56 AM UTC+2, Ben Roubicek wrote:
>>
>> Also, I found the commit that changed this behavior.
>>
>> https://github.com/ansible/ansible/commit/7287effb5ce241ce645d61e55e981edc73fa382a
>>
>>
>>
>> On Friday, September 9, 2016 at 5:01:23 PM UTC-7, Ben Roubicek wrote:
>>>
>>> Following up here. The behavior of lookup plugin evaluation has 
>>> definitely changed post 2.1.0.0.  In 2.1.0.0, lookup calls seem to be lazy 
>>> evaluated so if you perform an HTTP lookup to Hashicorp Vault, for example, 
>>> that call is only made when the variable is used in the play.  Post 
>>> 2.1.0.0, all variables seem to be evaluated for each task!  Thats a huge 
>>> change of behavior causing considerable performance problems for us since 
>>> we relay on gathering secrets from an external source specified at the 
>>> inventory level.
>>>
>>> Is there an explanation for why that change was made?  Can it be changed 
>>> back to the previous behavior?
>>>
>>> On Thursday, August 25, 2016 at 12:05:18 AM UTC-7, jhawkesworth wrote:

 I wonder if you can use uri delegated to your localhost to collect your 
 facts, and use set_fact to make them available to the play.

 Also I wonder if a custom setup module might be able to do what you 
 need? If you can get the vars you need into facts you can use one of the 
 existing fact caching mechanisms 
 http://docs.ansible.com/ansible/playbooks_variables.html#fact-caching

 Jon

 On Tuesday, August 23, 2016 at 2:54:36 PM UTC+1, Artyom Aleksandrov 
 wrote:
>
> Hello,
>
>
> Thank you for answer. It's bad news because all variables which *can 
> be* used by host  reads  on each task. 
> Variables in *all* group of course provide most reads but problem 
> also applies to other groups and host_vars. =(
>  
> > Can you give a different example of that being in a group the 
> current host is not a member of and yet still being referenced?
>
> No. There is no problem with this case.
>
> Have you thought about changing this behavior or implementing runtime 
> cache for variables?
>
>
> On Wednesday, August 17, 2016 at 6:05:29 PM UTC+3, James Cammarata 
> wrote:
>>
>> Sorry for the delayed response, but in the case of having it in the 
>> `all` group, that var would be referenced on every task due to the fact 
>> that it's being brought in for all hosts.
>>
>> Can you give a different example of that being in a group the current 
>> host is not a member of and yet still being referenced?
>>
>> James Cammarata
>>
>> Ansible Lead/Sr. Principal Software Engineer
>> Ansible by Red Hat
>> twitter: @thejimic, github: jimi-c
>>
>> On Fri, Aug 5, 2016 at 4:07 PM, Artyom Aleksandrov  
>> wrote:
>>
>>> Hello,
>>>
>>> I'm using lookup_plugin which go to external service and get 
>>> variables via HTTPS protocol.
>>>
>>> Several time ago I noticed that playbooks had took more time than 
>>> before. During