Re: variable scope with dynamic dags

2017-03-23 Thread Boris Tyukin
well I did more testing today using guppy to measure memory consumption. Also was watching processes and memory with htop while kicking off dags. My test python object was defined like that: payload = [1] * (2 * 10 ** 7) # 152 Mb As Jeremiah said, the entire python code that generates dags is

Re: variable scope with dynamic dags

2017-03-22 Thread Boris Tyukin
thanks Jeremiah, this is exactly what was bugging me. I am going to rewrite that code and look at persistent storage. your explanation helped, thanks! On Wed, Mar 22, 2017 at 2:29 PM, Jeremiah Lowin wrote: > In vanilla Python, your DAGs will all reference the same object, so

Re: variable scope with dynamic dags

2017-03-22 Thread Jeremiah Lowin
In vanilla Python, your DAGs will all reference the same object, so when your DAG file is parsed and 200 DAGs are created, there will still only be 1 60MB dict object created (I say vanilla because there are obviously ways to create copies of the object). HOWEVER, you should assume that each

Re: variable scope with dynamic dags

2017-03-22 Thread Boris Tyukin
hi Jeremiah, thanks for the explanation! i am very new to Python so was surprised that it works and my external dictionary object was still accessible to all dags generated. I think it makes sense but I would like to confirm one thing and I do not know how to test it myself. do you think that

Re: variable scope with dynamic dags

2017-03-22 Thread Jeremiah Lowin
At the risk of oversimplifying things, your DAG definition file is loaded *every* time a DAG (or any task in that DAG) is run. Think of it as a literal Python import of your dag-defining module: any variables are loaded along with the DAGs, which are then executed. That's why your dict is always

variable scope with dynamic dags

2017-03-22 Thread Boris Tyukin
Hi, I have a weird question but it bugs my mind. I have some like below to generate dags dynamically, using Max's example code from FAQ. It works fine but I have one large dict (let's call it my_outer_dict) that takes over 60Mb in memory and I need to access it from all generated dags. Needless