[jira] [Assigned] (AIRFLOW-3501) Add config option to load dags in an image with the kubernetes executor.

2018-12-13 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-3501:
--

Assignee: Kevin Pullin

> Add config option to load dags in an image with the kubernetes executor.
> 
>
> Key: AIRFLOW-3501
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3501
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: kubernetes
>Reporter: Kevin Pullin
>Assignee: Kevin Pullin
>Priority: Major
>
> Currently the airflow kubernetes executor forces loading dags either from a 
> volume claim or an init container.  There should be an option to bypass these 
> settings and instead use dags packaged into the running image.
> The motivation for this change is to allow for an airflow image to be built 
> and released via a CI/CD pipeline upon a new commit to a dag repository.  For 
> example, given a new git commit to a dag repo, a CI/CD server can build an 
> airflow docker image, run tests against the current dags, and finally push 
> the entire bundle as a single, complete, well-known unit to kubernetes.
> There's no need to worry that a git init container will fail, having to have 
> a separate pipeline to update dags on a shared volume, etc.  And if issues 
> arise from an update, the configuration can be easily rolled back to the 
> prior version of the image.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-3452) Cannot view dags at /home page

2018-12-10 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-3452:
--

Assignee: (was: Jinhui Zhang)

> Cannot view dags at /home page
> --
>
> Key: AIRFLOW-3452
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3452
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Jinhui Zhang
>Priority: Blocker
>
> I checked out the latest master branch(commit 
> {{[9dce1f0|https://github.com/apache/incubator-airflow/commit/9dce1f0740f69af0ee86709a1a34a002b245aa3e]}})
>  and restarted my Airflow webserver. But I cannot view any dag at the home 
> page. I inspected the frontend code and found there's a 
> {{style="display:none;"}} on the \{{main-content}}, and the source code says 
> so at 
> [https://github.com/apache/incubator-airflow/blob/master/airflow/www_rbac/templates/airflow/dags.html#L31]
>  . Is this a known issue? How should I fix it? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-3452) Cannot view dags at /home page

2018-12-10 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-3452:
--

Assignee: Jinhui Zhang

> Cannot view dags at /home page
> --
>
> Key: AIRFLOW-3452
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3452
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Jinhui Zhang
>Assignee: Jinhui Zhang
>Priority: Blocker
>
> I checked out the latest master branch(commit 
> {{[9dce1f0|https://github.com/apache/incubator-airflow/commit/9dce1f0740f69af0ee86709a1a34a002b245aa3e]}})
>  and restarted my Airflow webserver. But I cannot view any dag at the home 
> page. I inspected the frontend code and found there's a 
> {{style="display:none;"}} on the \{{main-content}}, and the source code says 
> so at 
> [https://github.com/apache/incubator-airflow/blob/master/airflow/www_rbac/templates/airflow/dags.html#L31]
>  . Is this a known issue? How should I fix it? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments

2018-11-27 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-987:
-

Assignee: (was: Pratap20)

> `airflow kerberos` ignores --keytab and --principal arguments
> -
>
> Key: AIRFLOW-987
> URL: https://issues.apache.org/jira/browse/AIRFLOW-987
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.8.0
> Environment: 1.8-rc5
>Reporter: Ruslan Dautkhanov
>Priority: Major
>  Labels: easyfix, kerberos, security
>
> No matter which arguments I pass to `airflow kerberos`, 
> it always executes as `kinit -r 3600m -k -t airflow.keytab -c 
> /tmp/airflow_krb5_ccache airflow`
> So it failes with expected "kinit: Keytab contains no suitable keys for 
> airf...@corp.some.com while getting initial credentials"
> Tried different arguments, -kt and --keytab, here's one of the runs (some 
> lines wrapped for readability):
> {noformat}
> $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com
> [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor
> [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from 
> keytab: 
> kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow
> [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR -
>  Couldn't reinit from keytab! `kinit' exited with 1.
> kinit: Keytab contains no suitable keys for airf...@corp.some.com 
> while getting initial credentials
> {noformat}
> 1.8-rc5



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-1856) How to allow airflow dags for concrete user(s) only?

2018-11-21 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-1856:
--

Assignee: Lokesh Chinnaga

> How to allow airflow dags for concrete user(s) only?
> 
>
> Key: AIRFLOW-1856
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1856
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication, ui, webapp
>Reporter: Ikar Pohorsky
>Assignee: Lokesh Chinnaga
>Priority: Major
>
> The problem is pretty simple. I need to limit airflow web users to see and 
> execute only certain DAGs and tasks.
> If possible, I'd prefer not to use 
> [Kerberos|https://airflow.incubator.apache.org/security.html#kerberos] nor 
> [OAuth|https://airflow.incubator.apache.org/security.html#oauth-authentication].
> The 
> [Multi-tenancy|https://airflow.incubator.apache.org/security.html#multi-tenancy]
>  option seems like an option to go, but couldn't make it work the way I 
> expect.
> My current setup:
> * added airflow web users _test_ and _ikar_ via [Web Authentication / 
> Password|https://airflow.incubator.apache.org/security.html#password]
> * my unix username is _ikar_ with a home in _/home/ikar_
> * no _test_ unix user
> * airflow _1.8.2_ is installed in _/home/ikar/airflow_
> * added two DAGs with one task:
> ** one with _owner_ set to _ikar_
> ** one with _owner_ set to _test_
> * airflow.cfg:
> {code}
> [core]
> # The home folder for airflow, default is ~/airflow
> airflow_home = /home/ikar/airflow
> # The folder where your airflow pipelines live, most likely a
> # subfolder in a code repository
> # This path must be absolute
> dags_folder = /home/ikar/airflow-test/dags
> # The folder where airflow should store its log files
> # This path must be absolute
> base_log_folder = /home/ikar/airflow/logs
> # Airflow can store logs remotely in AWS S3 or Google Cloud Storage. Users
> # must supply a remote location URL (starting with either 's3://...' or
> # 'gs://...') and an Airflow connection id that provides access to the storage
> # location.
> remote_base_log_folder =
> remote_log_conn_id =
> # Use server-side encryption for logs stored in S3
> encrypt_s3_logs = False   
>   
>  
> # DEPRECATED option for remote log storage, use remote_base_log_folder 
> instead!  
> 
> s3_log_folder =   
>   
>  
>   
>   
>   
> # The executor class that airflow should use. Choices include 
>   
>  
> # SequentialExecutor, LocalExecutor, CeleryExecutor   
>   
>  
> executor = SequentialExecutor 
>   
>  
>   
>   
>   
> # The SqlAlchemy connection string to the metadata database.  
>   
>  
> # SqlAlchemy supports many different database engine, more information
>   
>  
> # their website   
>   
>  
> sql_alchemy_conn = sqlite:home/ikar/airflow/airflow.db
> # The SqlAlchemy pool size is the maximum number of database connections
> # in the pool.
> sql_alchemy_pool_size = 5
> # The SqlAlchemy pool recycle is the number of seconds a connection
> # can be idle in the pool before it is invalidated. This config does
> # not apply to sqlite.
> sql_alchemy_pool_recycle = 3600
> # The amount of parallelism as a setting to the executor. This defines
> 

[jira] [Assigned] (AIRFLOW-1856) How to allow airflow dags for concrete user(s) only?

2018-11-21 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-1856:
--

Assignee: (was: Lokesh Chinnaga)

> How to allow airflow dags for concrete user(s) only?
> 
>
> Key: AIRFLOW-1856
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1856
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication, ui, webapp
>Reporter: Ikar Pohorsky
>Priority: Major
>
> The problem is pretty simple. I need to limit airflow web users to see and 
> execute only certain DAGs and tasks.
> If possible, I'd prefer not to use 
> [Kerberos|https://airflow.incubator.apache.org/security.html#kerberos] nor 
> [OAuth|https://airflow.incubator.apache.org/security.html#oauth-authentication].
> The 
> [Multi-tenancy|https://airflow.incubator.apache.org/security.html#multi-tenancy]
>  option seems like an option to go, but couldn't make it work the way I 
> expect.
> My current setup:
> * added airflow web users _test_ and _ikar_ via [Web Authentication / 
> Password|https://airflow.incubator.apache.org/security.html#password]
> * my unix username is _ikar_ with a home in _/home/ikar_
> * no _test_ unix user
> * airflow _1.8.2_ is installed in _/home/ikar/airflow_
> * added two DAGs with one task:
> ** one with _owner_ set to _ikar_
> ** one with _owner_ set to _test_
> * airflow.cfg:
> {code}
> [core]
> # The home folder for airflow, default is ~/airflow
> airflow_home = /home/ikar/airflow
> # The folder where your airflow pipelines live, most likely a
> # subfolder in a code repository
> # This path must be absolute
> dags_folder = /home/ikar/airflow-test/dags
> # The folder where airflow should store its log files
> # This path must be absolute
> base_log_folder = /home/ikar/airflow/logs
> # Airflow can store logs remotely in AWS S3 or Google Cloud Storage. Users
> # must supply a remote location URL (starting with either 's3://...' or
> # 'gs://...') and an Airflow connection id that provides access to the storage
> # location.
> remote_base_log_folder =
> remote_log_conn_id =
> # Use server-side encryption for logs stored in S3
> encrypt_s3_logs = False   
>   
>  
> # DEPRECATED option for remote log storage, use remote_base_log_folder 
> instead!  
> 
> s3_log_folder =   
>   
>  
>   
>   
>   
> # The executor class that airflow should use. Choices include 
>   
>  
> # SequentialExecutor, LocalExecutor, CeleryExecutor   
>   
>  
> executor = SequentialExecutor 
>   
>  
>   
>   
>   
> # The SqlAlchemy connection string to the metadata database.  
>   
>  
> # SqlAlchemy supports many different database engine, more information
>   
>  
> # their website   
>   
>  
> sql_alchemy_conn = sqlite:home/ikar/airflow/airflow.db
> # The SqlAlchemy pool size is the maximum number of database connections
> # in the pool.
> sql_alchemy_pool_size = 5
> # The SqlAlchemy pool recycle is the number of seconds a connection
> # can be idle in the pool before it is invalidated. This config does
> # not apply to sqlite.
> sql_alchemy_pool_recycle = 3600
> # The amount of parallelism as a setting to the executor. This defines
> # the max number of task 

[jira] [Assigned] (AIRFLOW-3240) Airflow dags are not working (not starting t1)

2018-10-22 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-3240:
--

Assignee: (was: Ivan Vitoria)

> Airflow dags are not working (not starting t1)
> --
>
> Key: AIRFLOW-3240
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3240
> Project: Apache Airflow
>  Issue Type: Task
>  Components: DAG, DagRun
>Affects Versions: 1.8.0
>Reporter: Pandu
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-3240) Airflow dags are not working (not starting t1)

2018-10-22 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-3240:
--

Assignee: Ivan Vitoria

> Airflow dags are not working (not starting t1)
> --
>
> Key: AIRFLOW-3240
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3240
> Project: Apache Airflow
>  Issue Type: Task
>  Components: DAG, DagRun
>Affects Versions: 1.8.0
>Reporter: Pandu
>Assignee: Ivan Vitoria
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2064) Polish timezone implementation

2018-10-12 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-2064:
--

Assignee: Marcus Rehm

> Polish timezone implementation
> --
>
> Key: AIRFLOW-2064
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2064
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Bolke de Bruin
>Assignee: Marcus Rehm
>Priority: Blocker
> Fix For: 1.10.0
>
>
> Couple of things are left over after moving to time zone support:
>  
>  # End_dates within dags should be converted to UTC by using the time zone of 
> start_date if naive
>  # Task instances that are instantiated without timezone information for 
> their execution_date should convert those to UTC by using the DAG's timezone 
> or configured
>  # Some doc polishing
>  # Tests should be added that cover more of the edge cases



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2064) Polish timezone implementation

2018-10-12 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-2064:
--

Assignee: (was: Alex Lumpov)

> Polish timezone implementation
> --
>
> Key: AIRFLOW-2064
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2064
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Bolke de Bruin
>Priority: Blocker
> Fix For: 1.10.0
>
>
> Couple of things are left over after moving to time zone support:
>  
>  # End_dates within dags should be converted to UTC by using the time zone of 
> start_date if naive
>  # Task instances that are instantiated without timezone information for 
> their execution_date should convert those to UTC by using the DAG's timezone 
> or configured
>  # Some doc polishing
>  # Tests should be added that cover more of the edge cases



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2064) Polish timezone implementation

2018-10-12 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-2064:
--

Assignee: Alex Lumpov

> Polish timezone implementation
> --
>
> Key: AIRFLOW-2064
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2064
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Bolke de Bruin
>Assignee: Alex Lumpov
>Priority: Blocker
> Fix For: 1.10.0
>
>
> Couple of things are left over after moving to time zone support:
>  
>  # End_dates within dags should be converted to UTC by using the time zone of 
> start_date if naive
>  # Task instances that are instantiated without timezone information for 
> their execution_date should convert those to UTC by using the DAG's timezone 
> or configured
>  # Some doc polishing
>  # Tests should be added that cover more of the edge cases



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-3134) Add New Operator - MySQLToS3TransformOperator

2018-10-02 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-3134:
--

Assignee: (was: Stefano Francavilla)

> Add New Operator - MySQLToS3TransformOperator
> -
>
> Key: AIRFLOW-3134
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3134
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: operators
>Affects Versions: 1.10.0
>Reporter: Stefano Francavilla
>Priority: Minor
>  Labels: MissingFeature, operators
>
> Taking inspiration from the [S3Transform 
> Operator|https://github.com/apache/incubator-airflow/blob/master/airflow/operators/s3_file_transform_operator.py]
>  and from an use case I was working in the past weeks, I was wondering if it 
> would be useful to add a new operator: "MySQLToS3TransformOperator".
> The operator allows to transfer (transformed) data resulting from a SELECT 
> statement to an S3 Bucket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2794) Add delete support for Azure blob

2018-09-26 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-2794:
--

Assignee: Bart Eijk

> Add delete support for Azure blob
> -
>
> Key: AIRFLOW-2794
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2794
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: hooks, operators
>Reporter: Bart Eijk
>Assignee: Bart Eijk
>Priority: Trivial
>
> As a developer, I would like to have the ability to create tasks that can 
> delete files in Azure blob storage.
> Nice to have: the ability to delete a "folder", i.e. a prefix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2747) Explicit re-schedule of sensors

2018-09-17 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-2747:
--

Assignee: (was: Stefan Seelmann)

> Explicit re-schedule of sensors
> ---
>
> Key: AIRFLOW-2747
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2747
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core, operators
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: Screenshot_2018-07-12_14-10-24.png, 
> Screenshot_2018-09-16_20-09-28.png, Screenshot_2018-09-16_20-19-23.png
>
>
> By default sensors block a worker and just sleep between pokes. This is very 
> inefficient, especially when there are many long-running sensors.
> There is a hacky workaroud by setting a small timeout value and a high retry 
> number. But that has drawbacks:
>  * Errors raised by sensors are hidden and the sensor retries too often
>  * The sensor is retried in a fixed time interval (with optional exponential 
> backoff)
>  * There are many attempts and many log files are generated
>  I'd like to propose an explicit reschedule mechanism:
>  * A new "reschedule" flag for sensors, if set to True it will raise an 
> AirflowRescheduleException that causes a reschedule.
>  * AirflowRescheduleException contains the (earliest) re-schedule date.
>  * Reschedule requests are recorded in new `task_reschedule` table and 
> visualized in the Gantt view.
>  * A new TI dependency that checks if a sensor task is ready to be 
> re-scheduled.
> Advantages:
>  * This change is backward compatible. Existing sensors behave like before. 
> But it's possible to set the "reschedule" flag.
>  * The poke_interval, timeout, and soft_fail parameters are still respected 
> and used to calculate the next schedule time.
>  * Custom sensor implementations can even define the next sensible schedule 
> date by raising AirflowRescheduleException themselves.
>  * Existing TimeSensor and TimeDeltaSensor can also be changed to be 
> rescheduled when the time is reached.
>  * This mechanism can also be used by non-sensor operators (but then the new 
> ReadyToRescheduleDep has to be added to deps or BaseOperator).
> Design decisions and caveats:
>  * When handling AirflowRescheduleException the `try_number` is decremented. 
> That means that subsequent runs use the same try number and write to the same 
> log file.
>  * Sensor TI dependency check now depends on `task_reschedule` table. However 
> only the BaseSensorOperator includes the new ReadyToRescheduleDep for now.
> Open questions and TODOs:
>  * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting 
> the state back to `NONE`? This would require more changes in scheduler code 
> and especially in the UI, but the state of a task would be more explicit and 
> more transparent to the user.
>  * Add example/test for a non-sensor operator
>  * Document the new feature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2156) Parallelize Celery Executor

2018-09-09 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-2156:
--

Assignee: Kevin Yang  (was: Dan Davydov)

> Parallelize Celery Executor
> ---
>
> Key: AIRFLOW-2156
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2156
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: celery
>Reporter: Dan Davydov
>Assignee: Kevin Yang
>Priority: Major
>
> The CeleryExecutor doesn't currently support parallel execution to check task 
> states since Celery does not support this. This can greatly slow down the 
> Scheduler loops since each request to check a task's state is a network 
> request.
>  
> The Celery Executor should parallelize these requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-1555) Backfill job gets killed 1 hour after starting

2018-09-05 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-1555:
--

Assignee: Shreyas Joshi

> Backfill job gets killed 1 hour after starting
> --
>
> Key: AIRFLOW-1555
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1555
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: backfill
>Affects Versions: 1.8.1
> Environment: Airflow 1.8.1
> Celery 3.1.23 with one coordinator, redis and 3 workers
> Python 3.5.2
> Debian GNU/Linux 8.9 (jessie)
> snakebite uninstalled because it does not work with Python 3.5.2
> MySQL 5.6
>Reporter: Shreyas Joshi
>Assignee: Shreyas Joshi
>Priority: Major
> Fix For: 1.10.0
>
>
> *What happens?*
> After running for an hour tasks in a backfill die. The task log shows:
> {code}
> ...
> [2017-08-31 06:48:06,425] {jobs.py:2172} WARNING - Recorded pid 5451 is not a 
> descendant of the current pid 21571
> [2017-08-31 06:48:11,884] {jobs.py:2179} WARNING - State of this instance has 
> been externally set to failed. Taking the poison pill. So long.
> [2017-08-31 06:48:11,892] {helpers.py:220} WARNING - Terminating descendant 
> processes of [] PID: 5451
> [2017-08-31 06:48:11,892] {helpers.py:224} WARNING - Terminating descendant 
> process [] PID: 5459
> [2017-08-31 06:48:11,896] {helpers.py:231} WARNING - Waiting up to 5s for 
> processes to exit...
> ...
> {code}
> The backfill log shows:
> {code}
> ...
> [2017-08-31 11:23:44,025] {jobs.py:1729} ERROR - Executor reports task 
> instance  
> finished (failed) although the task says its running. Was the task killed 
> externally?
> [2017-08-31 11:23:44,025] {models.py:1427} ERROR - Executor reports task 
> instance  [running]> finished (failed) although the task says its running. Was the task 
> killed externally?
> ...
> {code}
> The Celery UI has the following exception, but status shows "success"
> {code}
> Traceback (most recent call last):
>   File 
> "/data/airflow-sources/.venv/lib/python3.5/site-packages/airflow/executors/celery_executor.py",
>  line 56, in execute_command
> subprocess.check_call(command, shell=True)
>   File "/usr/share/pyenv/versions/3.5.2/lib/python3.5/subprocess.py", line 
> 581, in check_call
> raise CalledProcessError(retcode, cmd)
> subprocess.CalledProcessError: Command 'airflow run dag_name task_name 
> 2017-08-30T02:00:00 --pickle 14 --local' returned non-zero exit status 1
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/data/airflow-sources/.venv/lib/python3.5/site-packages/celery/app/trace.py",
>  line 240, in trace_task
> R = retval = fun(*args, **kwargs)
>   File 
> "/data/airflow-sources/.venv/lib/python3.5/site-packages/celery/app/trace.py",
>  line 438, in __protected_call__
> return self.run(*args, **kwargs)
>   File 
> "/data/airflow-sources/.venv/lib/python3.5/site-packages/airflow/executors/celery_executor.py",
>  line 59, in execute_command
> raise AirflowException('Celery command failed')
> airflow.exceptions.AirflowException: Celery command failed
> {code}
> The tasks have timeouts explicitly set to 6 hours and SLA set to 5 hours. In 
> the course of debugging this I also set dagrun_timeout to 6 hours. It did not 
> make a difference.
> Here is a thread on [stackoverflow | 
> https://stackoverflow.com/questions/44274381/airflow-long-running-task-in-subdag-marked-as-failed-after-an-hour]
>  that talks about a very similar issue.
> These tasks run fine on our older Airflow 1.7. This is currently blocking our 
> upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2062) Support fine-grained Connection encryption

2018-09-04 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-2062:
--

Assignee: Jasper Kahn

> Support fine-grained Connection encryption
> --
>
> Key: AIRFLOW-2062
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2062
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Reporter: Wilson Lian
>Assignee: Jasper Kahn
>Priority: Minor
>
> This effort targets containerized tasks (e.g., those launched by 
> KubernetesExecutor). Under that paradigm, each task could potentially operate 
> under different credentials, and fine-grained Connection encryption will 
> enable an administrator to restrict which connections can be accessed by 
> which tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-1252) Experimental API - exception when conf is present in JSON body

2018-06-27 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-1252:
--

Assignee: (was: Sergio Herrera)

> Experimental API - exception when conf is present in JSON body
> --
>
> Key: AIRFLOW-1252
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1252
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api
>Affects Versions: Airflow 1.8, 1.8.1
>Reporter: Sergio Herrera
>Priority: Major
>  Labels: api
>
> When someones calls to the endpoint _POST 
> :/api/experimental/dags//dag_runs {}_, Airflow never run 
> that request if the body of that contains _conf_.
> This occurs due to a mismatch between types when calling function 
> _trigger_dag()_, which is also used by *CLI*. That function perform a 
> _json.loads(conf)_ because from CLI the type of conf is _string_, but, in the 
> other side, from *experimental API*, that type is _dict_ (because _Json_ is 
> processed before to get all data, such as execution_date).
> There are two possibilities:
> 1. Look for every use of _trigger_dag()_ function and put _Json_ formatting 
> from outside the function.
> 2. In the *experimental API*, put the conf in a string (with _json.dumps()_) 
> to allow _trigger_dag()_ transform into _dict_.
> I have implemented the second option, so I can make a PR with that if you 
> want.
> Thank you a lot
> EDIT: Also, there are no tests which uses conf in the Json passed through 
> request currently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-1104) Concurrency check in scheduler should count queued tasks as well as running

2018-06-12 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-1104:
--

Assignee: (was: Tao Feng)

> Concurrency check in scheduler should count queued tasks as well as running
> ---
>
> Key: AIRFLOW-1104
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1104
> Project: Apache Airflow
>  Issue Type: Bug
> Environment: see https://github.com/apache/incubator-airflow/pull/2221
> "Tasks with the QUEUED state should also be counted below, but for now we 
> cannot count them. This is because there is no guarantee that queued tasks in 
> failed dagruns will or will not eventually run and queued tasks that will 
> never run will consume slots and can stall a DAG. Once we can guarantee that 
> all queued tasks in failed dagruns will never run (e.g. make sure that all 
> running/newly queued TIs have running dagruns), then we can include QUEUED 
> tasks here, with the constraint that they are in running dagruns."
>Reporter: Alex Guziel
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-1764) Web Interface should not use experimental api

2018-05-31 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-1764:
--

Assignee: Niels Zeilemaker  (was: 黄晓明)

> Web Interface should not use experimental api
> -
>
> Key: AIRFLOW-1764
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1764
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api
>Reporter: Niels Zeilemaker
>Assignee: Niels Zeilemaker
>Priority: Major
> Fix For: 1.9.0
>
>
> The web interface should not use the experimental api as the authentication 
> options differ between the two. This means that the latest_runs call should 
> be moved into the web interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-1764) Web Interface should not use experimental api

2018-05-31 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-1764:
--

Assignee: 黄晓明  (was: Niels Zeilemaker)

> Web Interface should not use experimental api
> -
>
> Key: AIRFLOW-1764
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1764
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api
>Reporter: Niels Zeilemaker
>Assignee: 黄晓明
>Priority: Major
> Fix For: 1.9.0
>
>
> The web interface should not use the experimental api as the authentication 
> options differ between the two. This means that the latest_runs call should 
> be moved into the web interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2462) airflow.contrib.auth.backends.password_auth.PasswordUser exists bug

2018-05-26 Thread Anonymous (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-2462:
--

Assignee: froginwell

> airflow.contrib.auth.backends.password_auth.PasswordUser exists bug
> ---
>
> Key: AIRFLOW-2462
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2462
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication, contrib
>Affects Versions: 1.9.0
>Reporter: froginwell
>Assignee: froginwell
>Priority: Blocker
>
>  PasswordUser
> {quote}
> @password.setter
> def _set_password(self, plaintext):
>     self._password = generate_password_hash(plaintext, 12)
>     if PY3:
>         self._password = str(self._password, 'utf-8')
> {quote}
> _set_password should be renamed as password.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2462) airflow.contrib.auth.backends.password_auth.PasswordUser exists bug

2018-05-26 Thread Anonymous (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-2462:
--

Assignee: (was: froginwell)

> airflow.contrib.auth.backends.password_auth.PasswordUser exists bug
> ---
>
> Key: AIRFLOW-2462
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2462
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication, contrib
>Affects Versions: 1.9.0
>Reporter: froginwell
>Priority: Blocker
>
>  PasswordUser
> {quote}
> @password.setter
> def _set_password(self, plaintext):
>     self._password = generate_password_hash(plaintext, 12)
>     if PY3:
>         self._password = str(self._password, 'utf-8')
> {quote}
> _set_password should be renamed as password.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-1501) Google Cloud Storage delete operator

2018-04-27 Thread Anonymous (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-1501:
--

Assignee: Guillermo Rodríguez Cano  (was: Yu Ishikawa)

> Google Cloud Storage delete operator
> 
>
> Key: AIRFLOW-1501
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1501
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, operators
>Reporter: Yu Ishikawa
>Assignee: Guillermo Rodríguez Cano
>Priority: Major
>
> h2. Goals
> - Implement a new feature to delete objects on Google Cloud Storage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2352) Airflow isn't picking up earlier periods after DAG definition update

2018-04-21 Thread Anonymous (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-2352:
--

Assignee: Alex Lumpov

> Airflow isn't picking up earlier periods after DAG definition update
> 
>
> Key: AIRFLOW-2352
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2352
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.9.0
>Reporter: Slawomir Krysiak
>Assignee: Alex Lumpov
>Priority: Major
> Attachments: Screen Shot 2018-04-20 at 5.04.12 PM.png
>
>
> Hi,
>  
> It would be nice to be able to modify the period range (a.k.a start_date) per 
> dag/subdag and have scheduler pick it up. Not sure if that should be a 
> feature request or a bug, but I was under the assumption that it works that 
> way already. But for some reason it doesn't seem to be the case in 1.9.0 
> which I'm using for my POC. Attaching my message from gitter... BTW, it seems 
> that there's so many questions coming up on that channel but they don't seem 
> to be addressed promptly.
> Thanks,
> Slawomir
>  
> P.S. It would probably be helpful to be able to submit an 'end_date' 
> parameter to DAG/SubDAG... there may be datasets that are no longer produced, 
> yet they still have some period range extracted. Evolving transformation 
> pipelines would definitely benefit from this kind of option. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-273) Vectorized Logos

2018-04-19 Thread Anonymous (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-273:
-

Assignee: Ivan Vitoria  (was: George Leslie-Waksman)

> Vectorized Logos
> 
>
> Key: AIRFLOW-273
> URL: https://issues.apache.org/jira/browse/AIRFLOW-273
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: George Leslie-Waksman
>Assignee: Ivan Vitoria
>Priority: Trivial
>
> There has been interest on the mailing list in a SVG version of the logo.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2287) Missing and incorrect license headers

2018-04-14 Thread Anonymous (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-2287:
--

Assignee: Bolke de Bruin

> Missing and incorrect license headers
> -
>
> Key: AIRFLOW-2287
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2287
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Bolke de Bruin
>Assignee: Bolke de Bruin
>Priority: Blocker
> Fix For: 2.0.0
>
>
> * {color:#454545}a few files are missing licenses, like docs/Makefile{color}
>  * {color:#454545}please fix year in notice ("2016 and onwards” makes it a 
> little bard to work out when copyright would expire){color}
>  * {color:#454545}LICENSE is OK but some license texts are missing i.e. 
> Bootstrap Toggle, normalize.css, parallel.js. Note that in order to comply 
> with the terms of the the licenses the full text of the license MUST be 
> included.{color}
>  * {color:#454545}also note that ace and d3 are under a  BSD 3 clause not BSD 
> 2 clause{color}
>  * {color:#454545} A large number of files are missing the correct ASF 
> header. (see below){color}
>  ** {color:#454545}Re incorrect header not perfect but shows scope of the 
> issue:{color}
>  *** {color:#454545} find . -name "*.*" -exec grep "contributor license" {} 
> \; -print | wc{color}
>  *** {color:#454545} find . -name "*.*" -exec grep 
> "[http://www.apache.org/licenses/LICENSE-2.0]; {} \; -print | wc{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2118) get_pandas_df does always pass a list of rows to be parsed

2018-03-05 Thread Anonymous (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-2118:
--

Assignee: Diane Ivy

> get_pandas_df does always pass a list of rows to be parsed
> --
>
> Key: AIRFLOW-2118
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2118
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, hooks
>Affects Versions: 1.9.0
> Environment: pandas-gbp 0.3.1
>Reporter: Diane Ivy
>Assignee: Diane Ivy
>Priority: Minor
>  Labels: easyfix
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> While trying to parse the pages in get_pandas_df if only one page is returned 
> it starts popping off each row and then the gbq_parse_data works incorrectly.
> {{while len(pages) > 0:}}
> {{    page = pages.pop()}}
> {{    dataframe_list.append(gbq_parse_data(schema, page))}}
> Possible solution:
> {{from google.cloud import bigquery}}
> {{if isinstance(pages[0], bigquery.table.Row):}}
> {{    pages = [pages]}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2146) Initialize default Google BigQuery Connection with valid conn_type & Fix broken DBApiHook

2018-02-25 Thread Anonymous (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-2146:
--

Assignee: (was: Kaxil Naik)

> Initialize default Google BigQuery Connection with valid conn_type & Fix 
> broken DBApiHook
> -
>
> Key: AIRFLOW-2146
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2146
> Project: Apache Airflow
>  Issue Type: Task
>  Components: contrib, gcp
>Reporter: Kaxil Naik
>Priority: Major
> Fix For: 1.10.0
>
>
> `airflow initdb` creates a connection with conn_id='bigquery_default' and 
> conn_type='bigquery'. However, bigquery is not a valid conn_type, according 
> to models.Connection._types, and BigQuery connections should use the 
> google_cloud_platform conn_type.
> Also as [renanleme|https://github.com/renanleme] mentioned 
> [here|https://github.com/apache/incubator-airflow/pull/3031#issuecomment-368132910]
>  the dags he has created are broken when he is using `get_records()` from 
> BigQueryHook which is extended from DbApiHook.
> *Error Log*:
> {code}
> Traceback (most recent call last):
>   File "/src/apache-airflow/airflow/models.py", line 1519, in _run_raw_task
> result = task_copy.execute(context=context)
>   File "/airflow/dags/lib/operators/test_operator.py", line 21, in execute
> records = self._get_db_hook(self.source_conn_id).get_records(self.sql)
>   File "/src/apache-airflow/airflow/hooks/base_hook.py", line 92, in 
> get_records
> raise NotImplementedError()
> {code}
> *Dag*:
> {code:python}
> from datetime import datetime
> from airflow import DAG
> from lib.operators.test_operator import TestOperator
> default_args = {
> 'depends_on_past': False,
> 'start_date': datetime(2018, 2, 21),
> }
> dag = DAG(
> 'test_dag',
> default_args=default_args,
> schedule_interval='0 6 * * *'
> )
> sql = '''
> SELECT id from YOUR_BIGQUERY_TABLE limit 10
> '''
> compare_grouped_event = TestOperator(
> task_id='test_operator',
> source_conn_id='gcp_airflow',
> sql=sql,
> dag=dag
> )
> {code}
> *Operator*:
> {code:python}
> from airflow.hooks.base_hook import BaseHook
> from airflow.models import BaseOperator
> from airflow.utils.decorators import apply_defaults
> class TestOperator(BaseOperator):
> @apply_defaults
> def __init__(
> self,
> sql,
> source_conn_id=None,
> *args, **kwargs):
> super(TestOperator, self).__init__(*args, **kwargs)
> self.sql = sql
> self.source_conn_id = source_conn_id
> def execute(self, context=None):
> records = self._get_db_hook(self.source_conn_id).get_records(self.sql)
> self.log.info('Fetched records from source')
> @staticmethod
> def _get_db_hook(conn_id):
> return BaseHook.get_hook(conn_id=conn_id)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2058) Scheduler uses MainThread for DAG file processing

2018-02-15 Thread Anonymous (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-2058:
--

Assignee: Yang Pan

> Scheduler uses MainThread for DAG file processing
> -
>
> Key: AIRFLOW-2058
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2058
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Affects Versions: 1.9.0
> Environment: Ubuntu, Airflow 1.9, Sequential executor
>Reporter: Yang Pan
>Assignee: Yang Pan
>Priority: Blocker
>
> By reading the [source code 
> |https://github.com/apache/incubator-airflow/blob/61ff29e578d1121ab4606fe122fb4e2db8f075b9/airflow/utils/dag_processing.py#L538]
>  it appears the scheduler will process each DAG file, either a .py or .zip, 
> using a new process. 
>  
> If I understand correctly, in theory what should happen in terms of 
> processing a .zip file is that the dedicated process will add the .zip file 
> to the PYTHONPATH, and load the file's module and dependency. When the DAG 
> read is done, the process gets destroyed. And since the PYTHONPATH is process 
> scoped, it won't pollute other processes.
>  
> However by printing out the threads and process id, it looks like Airflow 
> scheduler can sometimes accidentally pick up the main process instead of 
> creating a new one, and that's when collision happens.
>  
> Here is snippet of the PYTHONPATH when advanced_dag_dependency-1.zip is being 
> processed. As you can see when it's executed by MainThread, it contains other 
> .zip files. When it's using dedicated thread, only required .zip is added.
>  
> sys.path :['/root/airflow/dags/yang_subdag_2.zip', 
> '/root/airflow/dags/yang_subdag_2.zip', 
> '/root/airflow/dags/yang_subdag_1.zip', 
> '/root/airflow/dags/yang_subdag_1.zip', 
> '/root/airflow/dags/advanced_dag_dependency-2.zip', 
> '/root/airflow/dags/advanced_dag_dependency-2.zip', 
> '/root/airflow/dags/advanced_dag_dependency-1.zip', 
> '/root/airflow/dags/advanced_dag_dependency-1.zip', 
> '/root/airflow/dags/yang_subdag_1', '/usr/local/bin', '/usr/lib/python2.7', 
> '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', 
> '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', 
> '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', 
> '/usr/lib/python2.7/dist-packages/PILcompat', '/root/airflow/config', 
> '/root/airflow/dags', '/root/airflow/plugins'] 
> Print from MyFirstOperator in Dag 1 
> process id: 5059 
> thread id: <_MainThread(*MainThread*, started 140339858560768)> 
>  
> sys.path :[u'/root/airflow/dags/advanced_dag_dependency-1.zip', 
> '/usr/local/bin', '/usr/lib/python2.7', 
> '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', 
> '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', 
> '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', 
> '/usr/lib/python2.7/dist-packages/PILcompat', '/root/airflow/config', 
> '/root/airflow/dags', '/root/airflow/plugins'] 
> Print from MyFirstOperator in Dag 1 
> process id: 5076 
> thread id: <_MainThread(*DagFileProcessor283*, started 140137838294784)> 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2030) dbapi_hook KeyError: 'i' at line 225

2018-01-25 Thread Anonymous (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-2030:
--

Assignee: Manish Kumar Untwal

> dbapi_hook KeyError: 'i' at line 225
> 
>
> Key: AIRFLOW-2030
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2030
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Affects Versions: 1.9.0
>Reporter: Manish Kumar Untwal
>Assignee: Manish Kumar Untwal
>Priority: Major
>
> There is no local variable defined for zero rows, so the logger throws an 
> KeyError for local variable 'i' at 225 line in  dbapi_hook.py



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-1582) Improve logging structure of Airflow

2017-09-09 Thread Anonymous (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-1582:
--

Assignee: Fokko Driesprong

> Improve logging structure of Airflow
> 
>
> Key: AIRFLOW-1582
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1582
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>
> Hi,
> I would like to improve the logging within Airflow. Currently the logging is 
> missing some consistency across the project. I would like to:
> - Remove airflow/utils/logging.py and move everything to /airflow/utils/log/*
> - Initialise local loggers with the name of the class
> - Move the settings of the logging to one central place
> - Remove setting explicit logging levels within the code
> Future PR's
> - Remove verbose boolean settings, which make no sense; if you want more 
> verbose logging you should set this by increasing the logging verbosity, and 
> this should not be set by a boolean variable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (AIRFLOW-1463) Scheduler does not reschedule tasks in QUEUED state

2017-08-17 Thread Anonymous (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-1463:
--

Assignee: (was: Stanislav Pak)

> Scheduler does not reschedule tasks in QUEUED state
> ---
>
> Key: AIRFLOW-1463
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1463
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli
> Environment: Ubuntu 14.04
> Airflow 1.8.0
> SQS backed task queue, AWS RDS backed meta storage
> DAG folder is synced by script on code push: archive is downloaded from s3, 
> unpacked, moved, install script is run. airflow executable is replaced with 
> symlink pointing to the latest version of code, no airflow processes are 
> restarted.
>Reporter: Stanislav Pak
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Our pipelines related code is deployed almost simultaneously on all airflow 
> boxes: scheduler+webserver box, workers boxes. Some common python package is 
> deployed on those boxes on every other code push (3-5 deployments per hour). 
> Due to installation specifics, a DAG that imports module from that package 
> might fail. If DAG import fails when worker runs a task, the task is still 
> removed from the queue but task state is not changed, so in this case the 
> task stays in QUEUED state forever.
> Beside the described case, there is scenario when it happens because of DAG 
> update lag in scheduler. A task can be scheduled with old DAG and worker can 
> run the task with new DAG that fails to be imported.
> There might be other scenarios when it happens.
> Proposal:
> Catch errors when importing DAG on task run and clear task instance state if 
> import fails. This should fix transient issues of this kind.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (AIRFLOW-342) exception in 'airflow scheduler' : Connection reset by peer

2017-05-19 Thread Anonymous (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-342:
-

Assignee: Hila Visan

>  exception in 'airflow scheduler' : Connection reset by peer
> 
>
> Key: AIRFLOW-342
> URL: https://issues.apache.org/jira/browse/AIRFLOW-342
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery, scheduler
>Affects Versions: Airflow 1.7.1.3
> Environment: OS: Red Hat Enterprise Linux Server 7.2 (Maipo)
> Python: 2.7.5
> Airflow: 1.7.1.3
>Reporter: Hila Visan
>Assignee: Hila Visan
>
> 'airflow scheduler' command throws an exception when running it. 
> Despite the exception, the workers run the tasks from the queues as expected.
> Error details:
>  
> [2016-06-30 19:00:10,130] {jobs.py:758} ERROR - [Errno 104] Connection reset 
> by peer
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 755, in 
> _execute
> executor.heartbeat()
>   File "/usr/lib/python2.7/site-packages/airflow/executors/base_executor.py", 
> line 107, in heartbeat
> self.sync()
>   File 
> "/usr/lib/python2.7/site-packages/airflow/executors/celery_executor.py", line 
> 74, in sync
> state = async.state
>   File "/usr/lib/python2.7/site-packages/celery/result.py", line 394, in state
> return self._get_task_meta()['status']
>   File "/usr/lib/python2.7/site-packages/celery/result.py", line 339, in 
> _get_task_meta
> return self._maybe_set_cache(self.backend.get_task_meta(self.id))
>   File "/usr/lib/python2.7/site-packages/celery/backends/amqp.py", line 163, 
> in get_task_meta
> binding.declare()
>   File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 521, in 
> declare
>self.exchange.declare(nowait)
>   File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 174, in 
> declare
> nowait=nowait, passive=passive,
>   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 615, in 
> exchange_declare
> self._send_method((40, 10), args)
>   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, 
> in _send_method
> self.channel_id, method_sig, args, content,
>   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 221, 
> in write_method
> write_frame(1, channel, payload)
>   File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 182, in 
> write_frame
> frame_type, channel, size, payload, 0xce,
>   File "/usr/lib64/python2.7/socket.py", line 224, in meth
> return getattr(self._sock,name)(*args)
> error: [Errno 104] Connection reset by peer
> [2016-06-30 19:00:10,131] {jobs.py:759} ERROR - Tachycardia!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (AIRFLOW-93) Allow specifying multiple task execution deltas for ExternalTaskSensors

2016-06-13 Thread Anonymous (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-93:


Assignee: Jonas Esser  (was: Bence Nagy)

> Allow specifying multiple task execution deltas for ExternalTaskSensors
> ---
>
> Key: AIRFLOW-93
> URL: https://issues.apache.org/jira/browse/AIRFLOW-93
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: Airflow 1.7.0
>Reporter: Bence Nagy
>Assignee: Jonas Esser
>Priority: Minor
>
> I have some {{ExternalTaskSensor}}s with a schedule interval of 1 hour, where 
> the task depended upon has a schedule interval of 10 minutes. Right now I'm 
> depending only on the HH:50 execution, but it would be nice if I could 
> specify a range that I need all executions from HH:00 to HH:50 successful; 
> otherwise if the depended upon tasks are executed out of order the sensor 
> will pass even though I don't have data for the earlier parts of the hour yet.
> A workaround would be to have one sensor for each 10 minutes of the hour, but 
> that's too nasty for me. Especially if my sensor's schedule interval would be 
> 1 day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (AIRFLOW-93) Allow specifying multiple task execution deltas for ExternalTaskSensors

2016-06-13 Thread Anonymous (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-93:


Assignee: (was: Bence Nagy)

> Allow specifying multiple task execution deltas for ExternalTaskSensors
> ---
>
> Key: AIRFLOW-93
> URL: https://issues.apache.org/jira/browse/AIRFLOW-93
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: Airflow 1.7.0
>Reporter: Bence Nagy
>Priority: Minor
>
> I have some {{ExternalTaskSensor}}s with a schedule interval of 1 hour, where 
> the task depended upon has a schedule interval of 10 minutes. Right now I'm 
> depending only on the HH:50 execution, but it would be nice if I could 
> specify a range that I need all executions from HH:00 to HH:50 successful; 
> otherwise if the depended upon tasks are executed out of order the sensor 
> will pass even though I don't have data for the earlier parts of the hour yet.
> A workaround would be to have one sensor for each 10 minutes of the hour, but 
> that's too nasty for me. Especially if my sensor's schedule interval would be 
> 1 day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (AIRFLOW-93) Allow specifying multiple task execution deltas for ExternalTaskSensors

2016-06-13 Thread Anonymous (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-93:


Assignee: Bence Nagy

> Allow specifying multiple task execution deltas for ExternalTaskSensors
> ---
>
> Key: AIRFLOW-93
> URL: https://issues.apache.org/jira/browse/AIRFLOW-93
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: Airflow 1.7.0
>Reporter: Bence Nagy
>Assignee: Bence Nagy
>Priority: Minor
>
> I have some {{ExternalTaskSensor}}s with a schedule interval of 1 hour, where 
> the task depended upon has a schedule interval of 10 minutes. Right now I'm 
> depending only on the HH:50 execution, but it would be nice if I could 
> specify a range that I need all executions from HH:00 to HH:50 successful; 
> otherwise if the depended upon tasks are executed out of order the sensor 
> will pass even though I don't have data for the earlier parts of the hour yet.
> A workaround would be to have one sensor for each 10 minutes of the hour, but 
> that's too nasty for me. Especially if my sensor's schedule interval would be 
> 1 day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)