Re: Using Too Many Aiflow Variables in Dag is Good thing ?

2018-10-22 Thread Marcin SzymaƄski
Hi You are right, it's a sure way to saturate db connections, as a connection is established every few seconds when the DAGs are parsed. The same happens when you use variables in __init__ of an operator. Os environment variable would be safer for your need. Marcin On Mon, 22 Oct 2018, 08:34

Using Too Many Aiflow Variables in Dag is Good thing ?

2018-10-22 Thread Pramiti Goel
Hi, We want to make owner and email Id general, so we don't want to put in airflow dag. Using variables will help us in changing the email/owner later, if there are lot of dags of same owner. For example: default_args = { 'owner': Variable.get('test_owner_de'), 'depends_on_past':

Re: Using Too Many Aiflow Variables in Dag is Good thing ?

2018-10-22 Thread Sai Phanindhra
On top of that we can expire the cache in order of few times of scheduler runs(5 or 10 times one scheduler run time) On Mon 22 Oct, 2018, 16:27 Sai Phanindhra, wrote: > Thats true. But variable wont change very frequently. We can cache these > variables in some place outside airflow ecosystem.

Re: Using Too Many Aiflow Variables in Dag is Good thing ?

2018-10-22 Thread Sai Phanindhra
We need to use something outside airflow ecosystem. For caching we can still save values in memory or in file system. Since airflow is distributed across multiple systems, above approach won't be much efficient. We need to use caching solution outside airflow ecosystem. As long as its

Re: Using Too Many Aiflow Variables in Dag is Good thing ?

2018-10-22 Thread Ash Berlin-Taylor
Cache them where? When would it get invalidated? Given the DAG parsing happens in a sub-process how would the cache live longer than that process? I think the change might be to use a per-process/per-thread SQLA connection when parsing dags, so that if a DAG needs access to the metadata DB it

Re: Using Too Many Aiflow Variables in Dag is Good thing ?

2018-10-22 Thread Sai Phanindhra
Thats true. But variable wont change very frequently. We can cache these variables in some place outside airflow ecosystem. Something like redis or memcache. As queries to these dbs are fast. We can reduce the latency and decrease the number of connections to main database. This whole assumption

Re: Using Too Many Aiflow Variables in Dag is Good thing ?

2018-10-22 Thread Ash Berlin-Taylor
Redis is not a requirement of Airflow currently, nor should it become a hard requirement either. Benchmarks definitely needed before we bring in anything as complex as a cache, certainly. Queries to the variables table _should_ be fast too - even if it's got 1000 rows in it that is tiny by

Re: Using Too Many Aiflow Variables in Dag is Good thing ?

2018-10-22 Thread Sai Phanindhra
Who don't we cache variables? We can fairly assume that variables won't get changed very frequently(not as frequent as scheduler DAG run time). We can keep default timeout to few times scheduler run time. This will help control number of connections to database and reduces load both on scheduler

Pass execution date as a datetime object to MongoToS3Operator

2018-10-22 Thread Kyle Hamlin
Hi, I'm having an issue where I want to pass the dags execution_date to the query parameter in the MongoToS3Operator via templating. The templating works properly, however, it appears that pymongo will only filter date fields when passed a datetime object, and while the underlying object in the