Re: Something May Be Wrong with the Travis CI Tests

2018-10-14 Thread Driesprong, Fokko
Hi XD,

This is a very valid point. I think most of the Operators are still good,
since the PythonOperator is used quite a lot also in other tests, but we
should re-enable these tests as well like you mention. Nevertheless it
might be that the tests are outdated because they aren't updated. Thanks
for addressing this and picking it up.

This is also a nice opportunity for people who want to get involved into
contributing to Airflow.

Cheers, Fokko

Op za 13 okt. 2018 om 14:48 schreef Deng Xiaodong :

> Hi Fokko,
>
> I have tried your idea. You are correct: after prepend the filename with
> "test_", the CI test failed as we want (
>
> https://travis-ci.org/XD-DENG/incubator-airflow/builds/440983339?utm_source=github_status_medium=notification
> ).
> It DOES relate to the test discovery.
>
> We need to tackle this issue to make sure these tests really work (by
> prepending the test file names with "test_").
>
> But my concern is that some of these tests were never really run, and their
> corresponding operators/hooks/sensors may be very "unhealthy" (only in
> folder "tests/operators", there are 9 test scripts which were not named
> correctly, i.e., never really run). We can fix the tests itself
> quite easily, but fixing the potential "accumulated" issues in these
> corresponding operators/hooks/sensors may make this a big ticket to work
> on.
>
> Please let me know what you think.
> (I will start from DockerOperator first though).
>
>
> XD
>
> On Sat, Oct 13, 2018 at 8:02 PM Driesprong, Fokko 
> wrote:
>
> > Hi XD,
> >
> > Very good point. I was looking into this recently, but since time is a
> > limited matter, I did not really dig into it. It has to do with the test
> > discovery. The python_operator does not match the given pattern test*.py:
> >
> >
> https://docs.python.org/3/library/unittest.html#cmdoption-unittest-discover-p
> >
> > Could you try to prepend the filename with test_. For example,
> > test_python_operator.py?
> >
> > Cheers, Fokko
> >
> > Op za 13 okt. 2018 om 13:51 schreef Deng Xiaodong :
> >
> > > Hi folks, especially our committers,
> > >
> > > Something may be wrong with our Travis CI tests, unless I
> > > misunderstood/missed something.
> > >
> > > I'm checking *DockerOperator*, and some implementations inside are not
> > > making sense to me. But no CI tests ever failed due to it. When I check
> > the
> > > log of the historical Travis CI, surprisingly, I found the test of
> > > DockerOperator never really run (you search any one of the recent
> Travis
> > > log).
> > >
> > > To prove this, I forked the latest master branch and tried to add "self
> > > .assertTrue(1 == 0)" into the code of
> tests/operators/docker_operator.py
> > > <
> > >
> >
> https://github.com/XD-DENG/incubator-airflow/commit/2d6f47202349aa75b8d3e8e1631a285d2d75f1e7#diff-17e0452f4ce967751edfa767d46ae0ce
> > > >
> > >  and tests/operators/python_operator.py
> > > <
> > >
> >
> https://github.com/XD-DENG/incubator-airflow/commit/d7e4205f2f25dc2ea29356e4f43543f9b0bca963#diff-b5351e876d48957e2b64da5c16b0bd60
> > > >,
> > > which would for sure fail the tests. However, and as I suspected, the
> > > Travis CI passed (
> > > https://github.com/XD-DENG/incubator-airflow/commits/patch-6). This
> > means
> > > these two tests were never invoked during the Travis CI, and I believe
> > > these two are not the only tests affected.
> > >
> > > May anyone take a look into this? If I did misunderstand/miss
> something,
> > > kindly let me know.
> > >
> > > Many thanks!
> > >
> > > XD
> > >
> >
>


Re: Question on Running Airflow 1.10 in Kubernetes

2018-10-14 Thread Michael Ghen
We have a similar setup with Kubernetes. We deploy (often several times)
during the day when DAG runs are active and it does kill them. Like a few
others mentioned, we do a few things to mitigate any issues this would
cause:

1. DAGs are idempotent, can be rerun with no issues (we have a few
exceptions to this, so it goes)
2. We set retries on all DAGs so when they are killed during a deploy, they
will retry before alerting us
3. We log to a GCS bucket

We often do a few deployments in a day because we don't have our local
development environments set up as well as we should. We are getting better
at building and testing DAGs locally using Docker. Still, not uncommon to
do 1 or 2 deploys to production in the day. We have dag runs every hour
24/7, deploying while they're running hasn't been an issue given the 3
precautions taken above.

On Sun, Oct 14, 2018 at 4:48 PM Jeff Payne  wrote:

> We have a similar airflow system, except that everything is in the same
> container image. We use GCS for task log file storage, cloudsql postgres
> for the airflow db, and conda to package our DAGs and dependencies. We
> redeploy the entire system any time we want to deploy new DAGs or changes
> to any existing DAGs, which works out to once every week or two, often in
> the middle of active DAG runs. We are careful to try to keep the DAGs
> idempotent, which helps. Regardless, being conscious of what the DAGs are
> doing at each stage also helps ?
>
> I'm curious about your use cases that require multiple deployments in a
> single day...
>
> Get Outlook for Android
>
> 
> From: Daniel Imberman 
> Sent: Sunday, October 14, 2018 8:41:58 AM
> To: dev@airflow.incubator.apache.org
> Subject: Re: Question on Running Airflow 1.10 in Kubernetes
>
> Hi pramiti,
>
> We're in the process of allowing baked in images for the k8s executor
> (should be merged soon/possibly already merged). With this added you can
> specify the worker image in the airflow.cfg pretty easily the only
> potential issue with re-launching multiple times a day would be if a DAG
> was mid execution. Otherwise should be fine.
>
> WRT worker failures with the k8s executor you don't even need to shut down
> the workers since the workers only last as long as the tasks do. We also
> use the k8s event stream to bubble up any worker failures to the airflow UI
>
> On Sun, Oct 14, 2018, 3:56 AM Pramiti Goel 
> wrote:
>
> > Hi,
> >
> > We are trying to run airflow 1.10 in kubernetes.
> > 1) We are running our scheduler, worker and webserver service in
> individual
> > containers.
> > 2) We are using docker image which has airflow 1.10, python 3.x. We are
> > deploying our dags in docker image.
> >
> > With above architecture of airflow setup in kubernetes, whenever we
> deploy
> > dags, we need to create new docker image, kill the current running
> workers
> > in airflow and restart them again with new docker image.
> >
> > My question is: Is killing airflow worker (starting/stopping airflow
> worker
> > service )many times in a day is good and advisable ? What can be the risk
> > installed if worker doesn't gracefully shutdown(which i have seen quite
> > some time) ?
> >
> > Let me know if this is not correct place to ask.
> >
> > Thanks,
> > Pramiti
> >
>


Re: Question on Running Airflow 1.10 in Kubernetes

2018-10-14 Thread Jeff Payne
We have a similar airflow system, except that everything is in the same 
container image. We use GCS for task log file storage, cloudsql postgres for 
the airflow db, and conda to package our DAGs and dependencies. We redeploy the 
entire system any time we want to deploy new DAGs or changes to any existing 
DAGs, which works out to once every week or two, often in the middle of active 
DAG runs. We are careful to try to keep the DAGs idempotent, which helps. 
Regardless, being conscious of what the DAGs are doing at each stage also helps 
?

I'm curious about your use cases that require multiple deployments in a single 
day...

Get Outlook for Android


From: Daniel Imberman 
Sent: Sunday, October 14, 2018 8:41:58 AM
To: dev@airflow.incubator.apache.org
Subject: Re: Question on Running Airflow 1.10 in Kubernetes

Hi pramiti,

We're in the process of allowing baked in images for the k8s executor
(should be merged soon/possibly already merged). With this added you can
specify the worker image in the airflow.cfg pretty easily the only
potential issue with re-launching multiple times a day would be if a DAG
was mid execution. Otherwise should be fine.

WRT worker failures with the k8s executor you don't even need to shut down
the workers since the workers only last as long as the tasks do. We also
use the k8s event stream to bubble up any worker failures to the airflow UI

On Sun, Oct 14, 2018, 3:56 AM Pramiti Goel  wrote:

> Hi,
>
> We are trying to run airflow 1.10 in kubernetes.
> 1) We are running our scheduler, worker and webserver service in individual
> containers.
> 2) We are using docker image which has airflow 1.10, python 3.x. We are
> deploying our dags in docker image.
>
> With above architecture of airflow setup in kubernetes, whenever we deploy
> dags, we need to create new docker image, kill the current running workers
> in airflow and restart them again with new docker image.
>
> My question is: Is killing airflow worker (starting/stopping airflow worker
> service )many times in a day is good and advisable ? What can be the risk
> installed if worker doesn't gracefully shutdown(which i have seen quite
> some time) ?
>
> Let me know if this is not correct place to ask.
>
> Thanks,
> Pramiti
>


Re: Question on Running Airflow 1.10 in Kubernetes

2018-10-14 Thread Daniel Imberman
Hi pramiti,

We're in the process of allowing baked in images for the k8s executor
(should be merged soon/possibly already merged). With this added you can
specify the worker image in the airflow.cfg pretty easily the only
potential issue with re-launching multiple times a day would be if a DAG
was mid execution. Otherwise should be fine.

WRT worker failures with the k8s executor you don't even need to shut down
the workers since the workers only last as long as the tasks do. We also
use the k8s event stream to bubble up any worker failures to the airflow UI

On Sun, Oct 14, 2018, 3:56 AM Pramiti Goel  wrote:

> Hi,
>
> We are trying to run airflow 1.10 in kubernetes.
> 1) We are running our scheduler, worker and webserver service in individual
> containers.
> 2) We are using docker image which has airflow 1.10, python 3.x. We are
> deploying our dags in docker image.
>
> With above architecture of airflow setup in kubernetes, whenever we deploy
> dags, we need to create new docker image, kill the current running workers
> in airflow and restart them again with new docker image.
>
> My question is: Is killing airflow worker (starting/stopping airflow worker
> service )many times in a day is good and advisable ? What can be the risk
> installed if worker doesn't gracefully shutdown(which i have seen quite
> some time) ?
>
> Let me know if this is not correct place to ask.
>
> Thanks,
> Pramiti
>


How do you branch your code with BigQuery?

2018-10-14 Thread airflowuser
I believe this is quite common case when working with data.

If something : do A
else: do B
With coding PythonBranchOperator is the solution.

But when working on Google Cloud there is no way to do this.
All existed operators are designed to continue or fail on comparison of 
specific value:
BigQueryValueCheckOperator  with pass_value=500 will continue if 500 return or 
fail in any other case. Same for all other CheckOperators. You must know the 
value in advanced for this to work and it's not an actual branch but more of a 
way to stop the workflow if an unexpected result has been found.

But how do you handle a scenario where you want to do A or B based on condition 
from a query result? Nothing needs to be failed. just a simple branch.

XCOM could solve it. But there is no support for XCOM yet.

https://stackoverflow.com/questions/52801318/airflow-how-to-push-xcom-value-from-bigqueryoperator

Say for example:
the query represent the number of frauds.. if it's <1000 you want to email 
specific users (EmailOperator) , if it's >=1000 you want to run another 
operator and continue the workflow.

Any thoughts on the matter will be appreciated.

Question on Running Airflow 1.10 in Kubernetes

2018-10-14 Thread Pramiti Goel
Hi,

We are trying to run airflow 1.10 in kubernetes.
1) We are running our scheduler, worker and webserver service in individual
containers.
2) We are using docker image which has airflow 1.10, python 3.x. We are
deploying our dags in docker image.

With above architecture of airflow setup in kubernetes, whenever we deploy
dags, we need to create new docker image, kill the current running workers
in airflow and restart them again with new docker image.

My question is: Is killing airflow worker (starting/stopping airflow worker
service )many times in a day is good and advisable ? What can be the risk
installed if worker doesn't gracefully shutdown(which i have seen quite
some time) ?

Let me know if this is not correct place to ask.

Thanks,
Pramiti


Re: SqlAlchemy Pool config parameters to minimize connectivity issue impact

2018-10-14 Thread Pramiti Goel
Hi,
We  also faced this issue one month back for Airflow 1.9 with Celery
Executor. Unfortunately, we could not find the root cause immediately. This
can occur due to too many open connections. We end up closing all open
connections as immediate Solution and restarting Mysql Instance. But I
think what Kevin mentioned can be the cause of our problem too. We happened
to run airflow run command many times in last week before we start to see
such error.
@Kevin,
My doubt is, When does airflow run command connection is actually closed ?
Does it follow SQL_ALCHEMY_POOL_RECYCLE (1 hour) property and kill every
hour ? Because I think we saw some very old connections running too.

On Fri, Sep 28, 2018 at 4:32 PM Kevin Yang  wrote:

> Hi Raman,
> Would you elaborate a bit more on what exactly is the connectivity issues
> that you were facing and which version of Airflow are you on? We previously
> had some connectivity issue when the # of connection was too large and we
> had it fixed with this PR
> .
>
> Cheers,
> Kevin Y
>
> On Tue, Sep 25, 2018 at 11:57 PM ramandu...@gmail.com <
> ramandu...@gmail.com>
> wrote:
>
> > Hi All,
> >
> > We are observing sometimes Dag tasks get failed because of some
> > connectivity issues with Mysql server.
> > So Are there any recommended settings for mysql pool's related parameters
> > like
> > sql_alchemy_pool_size = 5
> > sql_alchemy_pool_recycle = 3600
> > to minimise  the connectivity issue impact.
> >
> > Thanks,
> > Raman Gupta
> >
>