[jira] [Assigned] (AIRFLOW-3501) Add config option to load dags in an image with the kubernetes executor.
[ https://issues.apache.org/jira/browse/AIRFLOW-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-3501: -- Assignee: Kevin Pullin > Add config option to load dags in an image with the kubernetes executor. > > > Key: AIRFLOW-3501 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3501 > Project: Apache Airflow > Issue Type: Improvement > Components: kubernetes >Reporter: Kevin Pullin >Assignee: Kevin Pullin >Priority: Major > > Currently the airflow kubernetes executor forces loading dags either from a > volume claim or an init container. There should be an option to bypass these > settings and instead use dags packaged into the running image. > The motivation for this change is to allow for an airflow image to be built > and released via a CI/CD pipeline upon a new commit to a dag repository. For > example, given a new git commit to a dag repo, a CI/CD server can build an > airflow docker image, run tests against the current dags, and finally push > the entire bundle as a single, complete, well-known unit to kubernetes. > There's no need to worry that a git init container will fail, having to have > a separate pipeline to update dags on a shared volume, etc. And if issues > arise from an update, the configuration can be easily rolled back to the > prior version of the image. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-3452) Cannot view dags at /home page
[ https://issues.apache.org/jira/browse/AIRFLOW-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-3452: -- Assignee: (was: Jinhui Zhang) > Cannot view dags at /home page > -- > > Key: AIRFLOW-3452 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3452 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Jinhui Zhang >Priority: Blocker > > I checked out the latest master branch(commit > {{[9dce1f0|https://github.com/apache/incubator-airflow/commit/9dce1f0740f69af0ee86709a1a34a002b245aa3e]}}) > and restarted my Airflow webserver. But I cannot view any dag at the home > page. I inspected the frontend code and found there's a > {{style="display:none;"}} on the \{{main-content}}, and the source code says > so at > [https://github.com/apache/incubator-airflow/blob/master/airflow/www_rbac/templates/airflow/dags.html#L31] > . Is this a known issue? How should I fix it? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-3452) Cannot view dags at /home page
[ https://issues.apache.org/jira/browse/AIRFLOW-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-3452: -- Assignee: Jinhui Zhang > Cannot view dags at /home page > -- > > Key: AIRFLOW-3452 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3452 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Jinhui Zhang >Assignee: Jinhui Zhang >Priority: Blocker > > I checked out the latest master branch(commit > {{[9dce1f0|https://github.com/apache/incubator-airflow/commit/9dce1f0740f69af0ee86709a1a34a002b245aa3e]}}) > and restarted my Airflow webserver. But I cannot view any dag at the home > page. I inspected the frontend code and found there's a > {{style="display:none;"}} on the \{{main-content}}, and the source code says > so at > [https://github.com/apache/incubator-airflow/blob/master/airflow/www_rbac/templates/airflow/dags.html#L31] > . Is this a known issue? How should I fix it? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments
[ https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-987: - Assignee: (was: Pratap20) > `airflow kerberos` ignores --keytab and --principal arguments > - > > Key: AIRFLOW-987 > URL: https://issues.apache.org/jira/browse/AIRFLOW-987 > Project: Apache Airflow > Issue Type: Bug > Components: security >Affects Versions: 1.8.0 > Environment: 1.8-rc5 >Reporter: Ruslan Dautkhanov >Priority: Major > Labels: easyfix, kerberos, security > > No matter which arguments I pass to `airflow kerberos`, > it always executes as `kinit -r 3600m -k -t airflow.keytab -c > /tmp/airflow_krb5_ccache airflow` > So it failes with expected "kinit: Keytab contains no suitable keys for > airf...@corp.some.com while getting initial credentials" > Tried different arguments, -kt and --keytab, here's one of the runs (some > lines wrapped for readability): > {noformat} > $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com > [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor > [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from > keytab: > kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow > [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR - > Couldn't reinit from keytab! `kinit' exited with 1. > kinit: Keytab contains no suitable keys for airf...@corp.some.com > while getting initial credentials > {noformat} > 1.8-rc5 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1856) How to allow airflow dags for concrete user(s) only?
[ https://issues.apache.org/jira/browse/AIRFLOW-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-1856: -- Assignee: Lokesh Chinnaga > How to allow airflow dags for concrete user(s) only? > > > Key: AIRFLOW-1856 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1856 > Project: Apache Airflow > Issue Type: Bug > Components: authentication, ui, webapp >Reporter: Ikar Pohorsky >Assignee: Lokesh Chinnaga >Priority: Major > > The problem is pretty simple. I need to limit airflow web users to see and > execute only certain DAGs and tasks. > If possible, I'd prefer not to use > [Kerberos|https://airflow.incubator.apache.org/security.html#kerberos] nor > [OAuth|https://airflow.incubator.apache.org/security.html#oauth-authentication]. > The > [Multi-tenancy|https://airflow.incubator.apache.org/security.html#multi-tenancy] > option seems like an option to go, but couldn't make it work the way I > expect. > My current setup: > * added airflow web users _test_ and _ikar_ via [Web Authentication / > Password|https://airflow.incubator.apache.org/security.html#password] > * my unix username is _ikar_ with a home in _/home/ikar_ > * no _test_ unix user > * airflow _1.8.2_ is installed in _/home/ikar/airflow_ > * added two DAGs with one task: > ** one with _owner_ set to _ikar_ > ** one with _owner_ set to _test_ > * airflow.cfg: > {code} > [core] > # The home folder for airflow, default is ~/airflow > airflow_home = /home/ikar/airflow > # The folder where your airflow pipelines live, most likely a > # subfolder in a code repository > # This path must be absolute > dags_folder = /home/ikar/airflow-test/dags > # The folder where airflow should store its log files > # This path must be absolute > base_log_folder = /home/ikar/airflow/logs > # Airflow can store logs remotely in AWS S3 or Google Cloud Storage. Users > # must supply a remote location URL (starting with either 's3://...' or > # 'gs://...') and an Airflow connection id that provides access to the storage > # location. > remote_base_log_folder = > remote_log_conn_id = > # Use server-side encryption for logs stored in S3 > encrypt_s3_logs = False > > > # DEPRECATED option for remote log storage, use remote_base_log_folder > instead! > > s3_log_folder = > > > > > > # The executor class that airflow should use. Choices include > > > # SequentialExecutor, LocalExecutor, CeleryExecutor > > > executor = SequentialExecutor > > > > > > # The SqlAlchemy connection string to the metadata database. > > > # SqlAlchemy supports many different database engine, more information > > > # their website > > > sql_alchemy_conn = sqlite:home/ikar/airflow/airflow.db > # The SqlAlchemy pool size is the maximum number of database connections > # in the pool. > sql_alchemy_pool_size = 5 > # The SqlAlchemy pool recycle is the number of seconds a connection > # can be idle in the pool before it is invalidated. This config does > # not apply to sqlite. > sql_alchemy_pool_recycle = 3600 > # The amount of parallelism as a setting to the executor. This defines >
[jira] [Assigned] (AIRFLOW-1856) How to allow airflow dags for concrete user(s) only?
[ https://issues.apache.org/jira/browse/AIRFLOW-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-1856: -- Assignee: (was: Lokesh Chinnaga) > How to allow airflow dags for concrete user(s) only? > > > Key: AIRFLOW-1856 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1856 > Project: Apache Airflow > Issue Type: Bug > Components: authentication, ui, webapp >Reporter: Ikar Pohorsky >Priority: Major > > The problem is pretty simple. I need to limit airflow web users to see and > execute only certain DAGs and tasks. > If possible, I'd prefer not to use > [Kerberos|https://airflow.incubator.apache.org/security.html#kerberos] nor > [OAuth|https://airflow.incubator.apache.org/security.html#oauth-authentication]. > The > [Multi-tenancy|https://airflow.incubator.apache.org/security.html#multi-tenancy] > option seems like an option to go, but couldn't make it work the way I > expect. > My current setup: > * added airflow web users _test_ and _ikar_ via [Web Authentication / > Password|https://airflow.incubator.apache.org/security.html#password] > * my unix username is _ikar_ with a home in _/home/ikar_ > * no _test_ unix user > * airflow _1.8.2_ is installed in _/home/ikar/airflow_ > * added two DAGs with one task: > ** one with _owner_ set to _ikar_ > ** one with _owner_ set to _test_ > * airflow.cfg: > {code} > [core] > # The home folder for airflow, default is ~/airflow > airflow_home = /home/ikar/airflow > # The folder where your airflow pipelines live, most likely a > # subfolder in a code repository > # This path must be absolute > dags_folder = /home/ikar/airflow-test/dags > # The folder where airflow should store its log files > # This path must be absolute > base_log_folder = /home/ikar/airflow/logs > # Airflow can store logs remotely in AWS S3 or Google Cloud Storage. Users > # must supply a remote location URL (starting with either 's3://...' or > # 'gs://...') and an Airflow connection id that provides access to the storage > # location. > remote_base_log_folder = > remote_log_conn_id = > # Use server-side encryption for logs stored in S3 > encrypt_s3_logs = False > > > # DEPRECATED option for remote log storage, use remote_base_log_folder > instead! > > s3_log_folder = > > > > > > # The executor class that airflow should use. Choices include > > > # SequentialExecutor, LocalExecutor, CeleryExecutor > > > executor = SequentialExecutor > > > > > > # The SqlAlchemy connection string to the metadata database. > > > # SqlAlchemy supports many different database engine, more information > > > # their website > > > sql_alchemy_conn = sqlite:home/ikar/airflow/airflow.db > # The SqlAlchemy pool size is the maximum number of database connections > # in the pool. > sql_alchemy_pool_size = 5 > # The SqlAlchemy pool recycle is the number of seconds a connection > # can be idle in the pool before it is invalidated. This config does > # not apply to sqlite. > sql_alchemy_pool_recycle = 3600 > # The amount of parallelism as a setting to the executor. This defines > # the max number of task
[jira] [Assigned] (AIRFLOW-3240) Airflow dags are not working (not starting t1)
[ https://issues.apache.org/jira/browse/AIRFLOW-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-3240: -- Assignee: (was: Ivan Vitoria) > Airflow dags are not working (not starting t1) > -- > > Key: AIRFLOW-3240 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3240 > Project: Apache Airflow > Issue Type: Task > Components: DAG, DagRun >Affects Versions: 1.8.0 >Reporter: Pandu >Priority: Critical > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-3240) Airflow dags are not working (not starting t1)
[ https://issues.apache.org/jira/browse/AIRFLOW-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-3240: -- Assignee: Ivan Vitoria > Airflow dags are not working (not starting t1) > -- > > Key: AIRFLOW-3240 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3240 > Project: Apache Airflow > Issue Type: Task > Components: DAG, DagRun >Affects Versions: 1.8.0 >Reporter: Pandu >Assignee: Ivan Vitoria >Priority: Critical > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2064) Polish timezone implementation
[ https://issues.apache.org/jira/browse/AIRFLOW-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-2064: -- Assignee: Marcus Rehm > Polish timezone implementation > -- > > Key: AIRFLOW-2064 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2064 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Bolke de Bruin >Assignee: Marcus Rehm >Priority: Blocker > Fix For: 1.10.0 > > > Couple of things are left over after moving to time zone support: > > # End_dates within dags should be converted to UTC by using the time zone of > start_date if naive > # Task instances that are instantiated without timezone information for > their execution_date should convert those to UTC by using the DAG's timezone > or configured > # Some doc polishing > # Tests should be added that cover more of the edge cases -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2064) Polish timezone implementation
[ https://issues.apache.org/jira/browse/AIRFLOW-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-2064: -- Assignee: (was: Alex Lumpov) > Polish timezone implementation > -- > > Key: AIRFLOW-2064 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2064 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Bolke de Bruin >Priority: Blocker > Fix For: 1.10.0 > > > Couple of things are left over after moving to time zone support: > > # End_dates within dags should be converted to UTC by using the time zone of > start_date if naive > # Task instances that are instantiated without timezone information for > their execution_date should convert those to UTC by using the DAG's timezone > or configured > # Some doc polishing > # Tests should be added that cover more of the edge cases -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2064) Polish timezone implementation
[ https://issues.apache.org/jira/browse/AIRFLOW-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-2064: -- Assignee: Alex Lumpov > Polish timezone implementation > -- > > Key: AIRFLOW-2064 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2064 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Bolke de Bruin >Assignee: Alex Lumpov >Priority: Blocker > Fix For: 1.10.0 > > > Couple of things are left over after moving to time zone support: > > # End_dates within dags should be converted to UTC by using the time zone of > start_date if naive > # Task instances that are instantiated without timezone information for > their execution_date should convert those to UTC by using the DAG's timezone > or configured > # Some doc polishing > # Tests should be added that cover more of the edge cases -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-3134) Add New Operator - MySQLToS3TransformOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-3134: -- Assignee: (was: Stefano Francavilla) > Add New Operator - MySQLToS3TransformOperator > - > > Key: AIRFLOW-3134 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3134 > Project: Apache Airflow > Issue Type: New Feature > Components: operators >Affects Versions: 1.10.0 >Reporter: Stefano Francavilla >Priority: Minor > Labels: MissingFeature, operators > > Taking inspiration from the [S3Transform > Operator|https://github.com/apache/incubator-airflow/blob/master/airflow/operators/s3_file_transform_operator.py] > and from an use case I was working in the past weeks, I was wondering if it > would be useful to add a new operator: "MySQLToS3TransformOperator". > The operator allows to transfer (transformed) data resulting from a SELECT > statement to an S3 Bucket. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2794) Add delete support for Azure blob
[ https://issues.apache.org/jira/browse/AIRFLOW-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-2794: -- Assignee: Bart Eijk > Add delete support for Azure blob > - > > Key: AIRFLOW-2794 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2794 > Project: Apache Airflow > Issue Type: Wish > Components: hooks, operators >Reporter: Bart Eijk >Assignee: Bart Eijk >Priority: Trivial > > As a developer, I would like to have the ability to create tasks that can > delete files in Azure blob storage. > Nice to have: the ability to delete a "folder", i.e. a prefix. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2747) Explicit re-schedule of sensors
[ https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-2747: -- Assignee: (was: Stefan Seelmann) > Explicit re-schedule of sensors > --- > > Key: AIRFLOW-2747 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2747 > Project: Apache Airflow > Issue Type: Improvement > Components: core, operators >Affects Versions: 1.9.0 >Reporter: Stefan Seelmann >Priority: Major > Fix For: 2.0.0 > > Attachments: Screenshot_2018-07-12_14-10-24.png, > Screenshot_2018-09-16_20-09-28.png, Screenshot_2018-09-16_20-19-23.png > > > By default sensors block a worker and just sleep between pokes. This is very > inefficient, especially when there are many long-running sensors. > There is a hacky workaroud by setting a small timeout value and a high retry > number. But that has drawbacks: > * Errors raised by sensors are hidden and the sensor retries too often > * The sensor is retried in a fixed time interval (with optional exponential > backoff) > * There are many attempts and many log files are generated > I'd like to propose an explicit reschedule mechanism: > * A new "reschedule" flag for sensors, if set to True it will raise an > AirflowRescheduleException that causes a reschedule. > * AirflowRescheduleException contains the (earliest) re-schedule date. > * Reschedule requests are recorded in new `task_reschedule` table and > visualized in the Gantt view. > * A new TI dependency that checks if a sensor task is ready to be > re-scheduled. > Advantages: > * This change is backward compatible. Existing sensors behave like before. > But it's possible to set the "reschedule" flag. > * The poke_interval, timeout, and soft_fail parameters are still respected > and used to calculate the next schedule time. > * Custom sensor implementations can even define the next sensible schedule > date by raising AirflowRescheduleException themselves. > * Existing TimeSensor and TimeDeltaSensor can also be changed to be > rescheduled when the time is reached. > * This mechanism can also be used by non-sensor operators (but then the new > ReadyToRescheduleDep has to be added to deps or BaseOperator). > Design decisions and caveats: > * When handling AirflowRescheduleException the `try_number` is decremented. > That means that subsequent runs use the same try number and write to the same > log file. > * Sensor TI dependency check now depends on `task_reschedule` table. However > only the BaseSensorOperator includes the new ReadyToRescheduleDep for now. > Open questions and TODOs: > * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting > the state back to `NONE`? This would require more changes in scheduler code > and especially in the UI, but the state of a task would be more explicit and > more transparent to the user. > * Add example/test for a non-sensor operator > * Document the new feature -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2156) Parallelize Celery Executor
[ https://issues.apache.org/jira/browse/AIRFLOW-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-2156: -- Assignee: Kevin Yang (was: Dan Davydov) > Parallelize Celery Executor > --- > > Key: AIRFLOW-2156 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2156 > Project: Apache Airflow > Issue Type: Improvement > Components: celery >Reporter: Dan Davydov >Assignee: Kevin Yang >Priority: Major > > The CeleryExecutor doesn't currently support parallel execution to check task > states since Celery does not support this. This can greatly slow down the > Scheduler loops since each request to check a task's state is a network > request. > > The Celery Executor should parallelize these requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1555) Backfill job gets killed 1 hour after starting
[ https://issues.apache.org/jira/browse/AIRFLOW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-1555: -- Assignee: Shreyas Joshi > Backfill job gets killed 1 hour after starting > -- > > Key: AIRFLOW-1555 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1555 > Project: Apache Airflow > Issue Type: Bug > Components: backfill >Affects Versions: 1.8.1 > Environment: Airflow 1.8.1 > Celery 3.1.23 with one coordinator, redis and 3 workers > Python 3.5.2 > Debian GNU/Linux 8.9 (jessie) > snakebite uninstalled because it does not work with Python 3.5.2 > MySQL 5.6 >Reporter: Shreyas Joshi >Assignee: Shreyas Joshi >Priority: Major > Fix For: 1.10.0 > > > *What happens?* > After running for an hour tasks in a backfill die. The task log shows: > {code} > ... > [2017-08-31 06:48:06,425] {jobs.py:2172} WARNING - Recorded pid 5451 is not a > descendant of the current pid 21571 > [2017-08-31 06:48:11,884] {jobs.py:2179} WARNING - State of this instance has > been externally set to failed. Taking the poison pill. So long. > [2017-08-31 06:48:11,892] {helpers.py:220} WARNING - Terminating descendant > processes of [] PID: 5451 > [2017-08-31 06:48:11,892] {helpers.py:224} WARNING - Terminating descendant > process [] PID: 5459 > [2017-08-31 06:48:11,896] {helpers.py:231} WARNING - Waiting up to 5s for > processes to exit... > ... > {code} > The backfill log shows: > {code} > ... > [2017-08-31 11:23:44,025] {jobs.py:1729} ERROR - Executor reports task > instance > finished (failed) although the task says its running. Was the task killed > externally? > [2017-08-31 11:23:44,025] {models.py:1427} ERROR - Executor reports task > instance [running]> finished (failed) although the task says its running. Was the task > killed externally? > ... > {code} > The Celery UI has the following exception, but status shows "success" > {code} > Traceback (most recent call last): > File > "/data/airflow-sources/.venv/lib/python3.5/site-packages/airflow/executors/celery_executor.py", > line 56, in execute_command > subprocess.check_call(command, shell=True) > File "/usr/share/pyenv/versions/3.5.2/lib/python3.5/subprocess.py", line > 581, in check_call > raise CalledProcessError(retcode, cmd) > subprocess.CalledProcessError: Command 'airflow run dag_name task_name > 2017-08-30T02:00:00 --pickle 14 --local' returned non-zero exit status 1 > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/data/airflow-sources/.venv/lib/python3.5/site-packages/celery/app/trace.py", > line 240, in trace_task > R = retval = fun(*args, **kwargs) > File > "/data/airflow-sources/.venv/lib/python3.5/site-packages/celery/app/trace.py", > line 438, in __protected_call__ > return self.run(*args, **kwargs) > File > "/data/airflow-sources/.venv/lib/python3.5/site-packages/airflow/executors/celery_executor.py", > line 59, in execute_command > raise AirflowException('Celery command failed') > airflow.exceptions.AirflowException: Celery command failed > {code} > The tasks have timeouts explicitly set to 6 hours and SLA set to 5 hours. In > the course of debugging this I also set dagrun_timeout to 6 hours. It did not > make a difference. > Here is a thread on [stackoverflow | > https://stackoverflow.com/questions/44274381/airflow-long-running-task-in-subdag-marked-as-failed-after-an-hour] > that talks about a very similar issue. > These tasks run fine on our older Airflow 1.7. This is currently blocking our > upgrade. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2062) Support fine-grained Connection encryption
[ https://issues.apache.org/jira/browse/AIRFLOW-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-2062: -- Assignee: Jasper Kahn > Support fine-grained Connection encryption > -- > > Key: AIRFLOW-2062 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2062 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Reporter: Wilson Lian >Assignee: Jasper Kahn >Priority: Minor > > This effort targets containerized tasks (e.g., those launched by > KubernetesExecutor). Under that paradigm, each task could potentially operate > under different credentials, and fine-grained Connection encryption will > enable an administrator to restrict which connections can be accessed by > which tasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1252) Experimental API - exception when conf is present in JSON body
[ https://issues.apache.org/jira/browse/AIRFLOW-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-1252: -- Assignee: (was: Sergio Herrera) > Experimental API - exception when conf is present in JSON body > -- > > Key: AIRFLOW-1252 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1252 > Project: Apache Airflow > Issue Type: Bug > Components: api >Affects Versions: Airflow 1.8, 1.8.1 >Reporter: Sergio Herrera >Priority: Major > Labels: api > > When someones calls to the endpoint _POST > :/api/experimental/dags//dag_runs {}_, Airflow never run > that request if the body of that contains _conf_. > This occurs due to a mismatch between types when calling function > _trigger_dag()_, which is also used by *CLI*. That function perform a > _json.loads(conf)_ because from CLI the type of conf is _string_, but, in the > other side, from *experimental API*, that type is _dict_ (because _Json_ is > processed before to get all data, such as execution_date). > There are two possibilities: > 1. Look for every use of _trigger_dag()_ function and put _Json_ formatting > from outside the function. > 2. In the *experimental API*, put the conf in a string (with _json.dumps()_) > to allow _trigger_dag()_ transform into _dict_. > I have implemented the second option, so I can make a PR with that if you > want. > Thank you a lot > EDIT: Also, there are no tests which uses conf in the Json passed through > request currently. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1104) Concurrency check in scheduler should count queued tasks as well as running
[ https://issues.apache.org/jira/browse/AIRFLOW-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-1104: -- Assignee: (was: Tao Feng) > Concurrency check in scheduler should count queued tasks as well as running > --- > > Key: AIRFLOW-1104 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1104 > Project: Apache Airflow > Issue Type: Bug > Environment: see https://github.com/apache/incubator-airflow/pull/2221 > "Tasks with the QUEUED state should also be counted below, but for now we > cannot count them. This is because there is no guarantee that queued tasks in > failed dagruns will or will not eventually run and queued tasks that will > never run will consume slots and can stall a DAG. Once we can guarantee that > all queued tasks in failed dagruns will never run (e.g. make sure that all > running/newly queued TIs have running dagruns), then we can include QUEUED > tasks here, with the constraint that they are in running dagruns." >Reporter: Alex Guziel >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1764) Web Interface should not use experimental api
[ https://issues.apache.org/jira/browse/AIRFLOW-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-1764: -- Assignee: Niels Zeilemaker (was: 黄晓明) > Web Interface should not use experimental api > - > > Key: AIRFLOW-1764 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1764 > Project: Apache Airflow > Issue Type: Bug > Components: api >Reporter: Niels Zeilemaker >Assignee: Niels Zeilemaker >Priority: Major > Fix For: 1.9.0 > > > The web interface should not use the experimental api as the authentication > options differ between the two. This means that the latest_runs call should > be moved into the web interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1764) Web Interface should not use experimental api
[ https://issues.apache.org/jira/browse/AIRFLOW-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-1764: -- Assignee: 黄晓明 (was: Niels Zeilemaker) > Web Interface should not use experimental api > - > > Key: AIRFLOW-1764 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1764 > Project: Apache Airflow > Issue Type: Bug > Components: api >Reporter: Niels Zeilemaker >Assignee: 黄晓明 >Priority: Major > Fix For: 1.9.0 > > > The web interface should not use the experimental api as the authentication > options differ between the two. This means that the latest_runs call should > be moved into the web interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2462) airflow.contrib.auth.backends.password_auth.PasswordUser exists bug
[ https://issues.apache.org/jira/browse/AIRFLOW-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-2462: -- Assignee: froginwell > airflow.contrib.auth.backends.password_auth.PasswordUser exists bug > --- > > Key: AIRFLOW-2462 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2462 > Project: Apache Airflow > Issue Type: Bug > Components: authentication, contrib >Affects Versions: 1.9.0 >Reporter: froginwell >Assignee: froginwell >Priority: Blocker > > PasswordUser > {quote} > @password.setter > def _set_password(self, plaintext): > self._password = generate_password_hash(plaintext, 12) > if PY3: > self._password = str(self._password, 'utf-8') > {quote} > _set_password should be renamed as password. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2462) airflow.contrib.auth.backends.password_auth.PasswordUser exists bug
[ https://issues.apache.org/jira/browse/AIRFLOW-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-2462: -- Assignee: (was: froginwell) > airflow.contrib.auth.backends.password_auth.PasswordUser exists bug > --- > > Key: AIRFLOW-2462 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2462 > Project: Apache Airflow > Issue Type: Bug > Components: authentication, contrib >Affects Versions: 1.9.0 >Reporter: froginwell >Priority: Blocker > > PasswordUser > {quote} > @password.setter > def _set_password(self, plaintext): > self._password = generate_password_hash(plaintext, 12) > if PY3: > self._password = str(self._password, 'utf-8') > {quote} > _set_password should be renamed as password. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1501) Google Cloud Storage delete operator
[ https://issues.apache.org/jira/browse/AIRFLOW-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-1501: -- Assignee: Guillermo Rodríguez Cano (was: Yu Ishikawa) > Google Cloud Storage delete operator > > > Key: AIRFLOW-1501 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1501 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, operators >Reporter: Yu Ishikawa >Assignee: Guillermo Rodríguez Cano >Priority: Major > > h2. Goals > - Implement a new feature to delete objects on Google Cloud Storage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2352) Airflow isn't picking up earlier periods after DAG definition update
[ https://issues.apache.org/jira/browse/AIRFLOW-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-2352: -- Assignee: Alex Lumpov > Airflow isn't picking up earlier periods after DAG definition update > > > Key: AIRFLOW-2352 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2352 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: 1.9.0 >Reporter: Slawomir Krysiak >Assignee: Alex Lumpov >Priority: Major > Attachments: Screen Shot 2018-04-20 at 5.04.12 PM.png > > > Hi, > > It would be nice to be able to modify the period range (a.k.a start_date) per > dag/subdag and have scheduler pick it up. Not sure if that should be a > feature request or a bug, but I was under the assumption that it works that > way already. But for some reason it doesn't seem to be the case in 1.9.0 > which I'm using for my POC. Attaching my message from gitter... BTW, it seems > that there's so many questions coming up on that channel but they don't seem > to be addressed promptly. > Thanks, > Slawomir > > P.S. It would probably be helpful to be able to submit an 'end_date' > parameter to DAG/SubDAG... there may be datasets that are no longer produced, > yet they still have some period range extracted. Evolving transformation > pipelines would definitely benefit from this kind of option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-273) Vectorized Logos
[ https://issues.apache.org/jira/browse/AIRFLOW-273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-273: - Assignee: Ivan Vitoria (was: George Leslie-Waksman) > Vectorized Logos > > > Key: AIRFLOW-273 > URL: https://issues.apache.org/jira/browse/AIRFLOW-273 > Project: Apache Airflow > Issue Type: Improvement >Reporter: George Leslie-Waksman >Assignee: Ivan Vitoria >Priority: Trivial > > There has been interest on the mailing list in a SVG version of the logo. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2287) Missing and incorrect license headers
[ https://issues.apache.org/jira/browse/AIRFLOW-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-2287: -- Assignee: Bolke de Bruin > Missing and incorrect license headers > - > > Key: AIRFLOW-2287 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2287 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Bolke de Bruin >Assignee: Bolke de Bruin >Priority: Blocker > Fix For: 2.0.0 > > > * {color:#454545}a few files are missing licenses, like docs/Makefile{color} > * {color:#454545}please fix year in notice ("2016 and onwards” makes it a > little bard to work out when copyright would expire){color} > * {color:#454545}LICENSE is OK but some license texts are missing i.e. > Bootstrap Toggle, normalize.css, parallel.js. Note that in order to comply > with the terms of the the licenses the full text of the license MUST be > included.{color} > * {color:#454545}also note that ace and d3 are under a BSD 3 clause not BSD > 2 clause{color} > * {color:#454545} A large number of files are missing the correct ASF > header. (see below){color} > ** {color:#454545}Re incorrect header not perfect but shows scope of the > issue:{color} > *** {color:#454545} find . -name "*.*" -exec grep "contributor license" {} > \; -print | wc{color} > *** {color:#454545} find . -name "*.*" -exec grep > "[http://www.apache.org/licenses/LICENSE-2.0]; {} \; -print | wc{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2118) get_pandas_df does always pass a list of rows to be parsed
[ https://issues.apache.org/jira/browse/AIRFLOW-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-2118: -- Assignee: Diane Ivy > get_pandas_df does always pass a list of rows to be parsed > -- > > Key: AIRFLOW-2118 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2118 > Project: Apache Airflow > Issue Type: Bug > Components: contrib, hooks >Affects Versions: 1.9.0 > Environment: pandas-gbp 0.3.1 >Reporter: Diane Ivy >Assignee: Diane Ivy >Priority: Minor > Labels: easyfix > Original Estimate: 1h > Remaining Estimate: 1h > > While trying to parse the pages in get_pandas_df if only one page is returned > it starts popping off each row and then the gbq_parse_data works incorrectly. > {{while len(pages) > 0:}} > {{ page = pages.pop()}} > {{ dataframe_list.append(gbq_parse_data(schema, page))}} > Possible solution: > {{from google.cloud import bigquery}} > {{if isinstance(pages[0], bigquery.table.Row):}} > {{ pages = [pages]}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2146) Initialize default Google BigQuery Connection with valid conn_type & Fix broken DBApiHook
[ https://issues.apache.org/jira/browse/AIRFLOW-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-2146: -- Assignee: (was: Kaxil Naik) > Initialize default Google BigQuery Connection with valid conn_type & Fix > broken DBApiHook > - > > Key: AIRFLOW-2146 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2146 > Project: Apache Airflow > Issue Type: Task > Components: contrib, gcp >Reporter: Kaxil Naik >Priority: Major > Fix For: 1.10.0 > > > `airflow initdb` creates a connection with conn_id='bigquery_default' and > conn_type='bigquery'. However, bigquery is not a valid conn_type, according > to models.Connection._types, and BigQuery connections should use the > google_cloud_platform conn_type. > Also as [renanleme|https://github.com/renanleme] mentioned > [here|https://github.com/apache/incubator-airflow/pull/3031#issuecomment-368132910] > the dags he has created are broken when he is using `get_records()` from > BigQueryHook which is extended from DbApiHook. > *Error Log*: > {code} > Traceback (most recent call last): > File "/src/apache-airflow/airflow/models.py", line 1519, in _run_raw_task > result = task_copy.execute(context=context) > File "/airflow/dags/lib/operators/test_operator.py", line 21, in execute > records = self._get_db_hook(self.source_conn_id).get_records(self.sql) > File "/src/apache-airflow/airflow/hooks/base_hook.py", line 92, in > get_records > raise NotImplementedError() > {code} > *Dag*: > {code:python} > from datetime import datetime > from airflow import DAG > from lib.operators.test_operator import TestOperator > default_args = { > 'depends_on_past': False, > 'start_date': datetime(2018, 2, 21), > } > dag = DAG( > 'test_dag', > default_args=default_args, > schedule_interval='0 6 * * *' > ) > sql = ''' > SELECT id from YOUR_BIGQUERY_TABLE limit 10 > ''' > compare_grouped_event = TestOperator( > task_id='test_operator', > source_conn_id='gcp_airflow', > sql=sql, > dag=dag > ) > {code} > *Operator*: > {code:python} > from airflow.hooks.base_hook import BaseHook > from airflow.models import BaseOperator > from airflow.utils.decorators import apply_defaults > class TestOperator(BaseOperator): > @apply_defaults > def __init__( > self, > sql, > source_conn_id=None, > *args, **kwargs): > super(TestOperator, self).__init__(*args, **kwargs) > self.sql = sql > self.source_conn_id = source_conn_id > def execute(self, context=None): > records = self._get_db_hook(self.source_conn_id).get_records(self.sql) > self.log.info('Fetched records from source') > @staticmethod > def _get_db_hook(conn_id): > return BaseHook.get_hook(conn_id=conn_id) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2058) Scheduler uses MainThread for DAG file processing
[ https://issues.apache.org/jira/browse/AIRFLOW-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-2058: -- Assignee: Yang Pan > Scheduler uses MainThread for DAG file processing > - > > Key: AIRFLOW-2058 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2058 > Project: Apache Airflow > Issue Type: Bug > Components: DAG >Affects Versions: 1.9.0 > Environment: Ubuntu, Airflow 1.9, Sequential executor >Reporter: Yang Pan >Assignee: Yang Pan >Priority: Blocker > > By reading the [source code > |https://github.com/apache/incubator-airflow/blob/61ff29e578d1121ab4606fe122fb4e2db8f075b9/airflow/utils/dag_processing.py#L538] > it appears the scheduler will process each DAG file, either a .py or .zip, > using a new process. > > If I understand correctly, in theory what should happen in terms of > processing a .zip file is that the dedicated process will add the .zip file > to the PYTHONPATH, and load the file's module and dependency. When the DAG > read is done, the process gets destroyed. And since the PYTHONPATH is process > scoped, it won't pollute other processes. > > However by printing out the threads and process id, it looks like Airflow > scheduler can sometimes accidentally pick up the main process instead of > creating a new one, and that's when collision happens. > > Here is snippet of the PYTHONPATH when advanced_dag_dependency-1.zip is being > processed. As you can see when it's executed by MainThread, it contains other > .zip files. When it's using dedicated thread, only required .zip is added. > > sys.path :['/root/airflow/dags/yang_subdag_2.zip', > '/root/airflow/dags/yang_subdag_2.zip', > '/root/airflow/dags/yang_subdag_1.zip', > '/root/airflow/dags/yang_subdag_1.zip', > '/root/airflow/dags/advanced_dag_dependency-2.zip', > '/root/airflow/dags/advanced_dag_dependency-2.zip', > '/root/airflow/dags/advanced_dag_dependency-1.zip', > '/root/airflow/dags/advanced_dag_dependency-1.zip', > '/root/airflow/dags/yang_subdag_1', '/usr/local/bin', '/usr/lib/python2.7', > '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', > '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', > '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', > '/usr/lib/python2.7/dist-packages/PILcompat', '/root/airflow/config', > '/root/airflow/dags', '/root/airflow/plugins'] > Print from MyFirstOperator in Dag 1 > process id: 5059 > thread id: <_MainThread(*MainThread*, started 140339858560768)> > > sys.path :[u'/root/airflow/dags/advanced_dag_dependency-1.zip', > '/usr/local/bin', '/usr/lib/python2.7', > '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', > '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', > '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', > '/usr/lib/python2.7/dist-packages/PILcompat', '/root/airflow/config', > '/root/airflow/dags', '/root/airflow/plugins'] > Print from MyFirstOperator in Dag 1 > process id: 5076 > thread id: <_MainThread(*DagFileProcessor283*, started 140137838294784)> -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2030) dbapi_hook KeyError: 'i' at line 225
[ https://issues.apache.org/jira/browse/AIRFLOW-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-2030: -- Assignee: Manish Kumar Untwal > dbapi_hook KeyError: 'i' at line 225 > > > Key: AIRFLOW-2030 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2030 > Project: Apache Airflow > Issue Type: Bug > Components: db >Affects Versions: 1.9.0 >Reporter: Manish Kumar Untwal >Assignee: Manish Kumar Untwal >Priority: Major > > There is no local variable defined for zero rows, so the logger throws an > KeyError for local variable 'i' at 225 line in dbapi_hook.py -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1582) Improve logging structure of Airflow
[ https://issues.apache.org/jira/browse/AIRFLOW-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-1582: -- Assignee: Fokko Driesprong > Improve logging structure of Airflow > > > Key: AIRFLOW-1582 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1582 > Project: Apache Airflow > Issue Type: Bug >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong > > Hi, > I would like to improve the logging within Airflow. Currently the logging is > missing some consistency across the project. I would like to: > - Remove airflow/utils/logging.py and move everything to /airflow/utils/log/* > - Initialise local loggers with the name of the class > - Move the settings of the logging to one central place > - Remove setting explicit logging levels within the code > Future PR's > - Remove verbose boolean settings, which make no sense; if you want more > verbose logging you should set this by increasing the logging verbosity, and > this should not be set by a boolean variable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (AIRFLOW-1463) Scheduler does not reschedule tasks in QUEUED state
[ https://issues.apache.org/jira/browse/AIRFLOW-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-1463: -- Assignee: (was: Stanislav Pak) > Scheduler does not reschedule tasks in QUEUED state > --- > > Key: AIRFLOW-1463 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1463 > Project: Apache Airflow > Issue Type: Improvement > Components: cli > Environment: Ubuntu 14.04 > Airflow 1.8.0 > SQS backed task queue, AWS RDS backed meta storage > DAG folder is synced by script on code push: archive is downloaded from s3, > unpacked, moved, install script is run. airflow executable is replaced with > symlink pointing to the latest version of code, no airflow processes are > restarted. >Reporter: Stanislav Pak >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > Our pipelines related code is deployed almost simultaneously on all airflow > boxes: scheduler+webserver box, workers boxes. Some common python package is > deployed on those boxes on every other code push (3-5 deployments per hour). > Due to installation specifics, a DAG that imports module from that package > might fail. If DAG import fails when worker runs a task, the task is still > removed from the queue but task state is not changed, so in this case the > task stays in QUEUED state forever. > Beside the described case, there is scenario when it happens because of DAG > update lag in scheduler. A task can be scheduled with old DAG and worker can > run the task with new DAG that fails to be imported. > There might be other scenarios when it happens. > Proposal: > Catch errors when importing DAG on task run and clear task instance state if > import fails. This should fix transient issues of this kind. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (AIRFLOW-342) exception in 'airflow scheduler' : Connection reset by peer
[ https://issues.apache.org/jira/browse/AIRFLOW-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-342: - Assignee: Hila Visan > exception in 'airflow scheduler' : Connection reset by peer > > > Key: AIRFLOW-342 > URL: https://issues.apache.org/jira/browse/AIRFLOW-342 > Project: Apache Airflow > Issue Type: Bug > Components: celery, scheduler >Affects Versions: Airflow 1.7.1.3 > Environment: OS: Red Hat Enterprise Linux Server 7.2 (Maipo) > Python: 2.7.5 > Airflow: 1.7.1.3 >Reporter: Hila Visan >Assignee: Hila Visan > > 'airflow scheduler' command throws an exception when running it. > Despite the exception, the workers run the tasks from the queues as expected. > Error details: > > [2016-06-30 19:00:10,130] {jobs.py:758} ERROR - [Errno 104] Connection reset > by peer > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 755, in > _execute > executor.heartbeat() > File "/usr/lib/python2.7/site-packages/airflow/executors/base_executor.py", > line 107, in heartbeat > self.sync() > File > "/usr/lib/python2.7/site-packages/airflow/executors/celery_executor.py", line > 74, in sync > state = async.state > File "/usr/lib/python2.7/site-packages/celery/result.py", line 394, in state > return self._get_task_meta()['status'] > File "/usr/lib/python2.7/site-packages/celery/result.py", line 339, in > _get_task_meta > return self._maybe_set_cache(self.backend.get_task_meta(self.id)) > File "/usr/lib/python2.7/site-packages/celery/backends/amqp.py", line 163, > in get_task_meta > binding.declare() > File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 521, in > declare >self.exchange.declare(nowait) > File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 174, in > declare > nowait=nowait, passive=passive, > File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 615, in > exchange_declare > self._send_method((40, 10), args) > File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, > in _send_method > self.channel_id, method_sig, args, content, > File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 221, > in write_method > write_frame(1, channel, payload) > File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 182, in > write_frame > frame_type, channel, size, payload, 0xce, > File "/usr/lib64/python2.7/socket.py", line 224, in meth > return getattr(self._sock,name)(*args) > error: [Errno 104] Connection reset by peer > [2016-06-30 19:00:10,131] {jobs.py:759} ERROR - Tachycardia! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (AIRFLOW-93) Allow specifying multiple task execution deltas for ExternalTaskSensors
[ https://issues.apache.org/jira/browse/AIRFLOW-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-93: Assignee: Jonas Esser (was: Bence Nagy) > Allow specifying multiple task execution deltas for ExternalTaskSensors > --- > > Key: AIRFLOW-93 > URL: https://issues.apache.org/jira/browse/AIRFLOW-93 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Affects Versions: Airflow 1.7.0 >Reporter: Bence Nagy >Assignee: Jonas Esser >Priority: Minor > > I have some {{ExternalTaskSensor}}s with a schedule interval of 1 hour, where > the task depended upon has a schedule interval of 10 minutes. Right now I'm > depending only on the HH:50 execution, but it would be nice if I could > specify a range that I need all executions from HH:00 to HH:50 successful; > otherwise if the depended upon tasks are executed out of order the sensor > will pass even though I don't have data for the earlier parts of the hour yet. > A workaround would be to have one sensor for each 10 minutes of the hour, but > that's too nasty for me. Especially if my sensor's schedule interval would be > 1 day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (AIRFLOW-93) Allow specifying multiple task execution deltas for ExternalTaskSensors
[ https://issues.apache.org/jira/browse/AIRFLOW-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-93: Assignee: (was: Bence Nagy) > Allow specifying multiple task execution deltas for ExternalTaskSensors > --- > > Key: AIRFLOW-93 > URL: https://issues.apache.org/jira/browse/AIRFLOW-93 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Affects Versions: Airflow 1.7.0 >Reporter: Bence Nagy >Priority: Minor > > I have some {{ExternalTaskSensor}}s with a schedule interval of 1 hour, where > the task depended upon has a schedule interval of 10 minutes. Right now I'm > depending only on the HH:50 execution, but it would be nice if I could > specify a range that I need all executions from HH:00 to HH:50 successful; > otherwise if the depended upon tasks are executed out of order the sensor > will pass even though I don't have data for the earlier parts of the hour yet. > A workaround would be to have one sensor for each 10 minutes of the hour, but > that's too nasty for me. Especially if my sensor's schedule interval would be > 1 day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (AIRFLOW-93) Allow specifying multiple task execution deltas for ExternalTaskSensors
[ https://issues.apache.org/jira/browse/AIRFLOW-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-93: Assignee: Bence Nagy > Allow specifying multiple task execution deltas for ExternalTaskSensors > --- > > Key: AIRFLOW-93 > URL: https://issues.apache.org/jira/browse/AIRFLOW-93 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Affects Versions: Airflow 1.7.0 >Reporter: Bence Nagy >Assignee: Bence Nagy >Priority: Minor > > I have some {{ExternalTaskSensor}}s with a schedule interval of 1 hour, where > the task depended upon has a schedule interval of 10 minutes. Right now I'm > depending only on the HH:50 execution, but it would be nice if I could > specify a range that I need all executions from HH:00 to HH:50 successful; > otherwise if the depended upon tasks are executed out of order the sensor > will pass even though I don't have data for the earlier parts of the hour yet. > A workaround would be to have one sensor for each 10 minutes of the hour, but > that's too nasty for me. Especially if my sensor's schedule interval would be > 1 day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)