[jira] [Created] (AIRFLOW-6586) GCSUploadSessionCompleteSensor breaks in reschedule mode.
Jacob Ferriero created AIRFLOW-6586: --- Summary: GCSUploadSessionCompleteSensor breaks in reschedule mode. Key: AIRFLOW-6586 URL: https://issues.apache.org/jira/browse/AIRFLOW-6586 Project: Apache Airflow Issue Type: Bug Components: operators Affects Versions: 1.10.3 Reporter: Jacob Ferriero This sensor is stateful and loses state between reschedules. We should: # Warn about this in docstring # Add a `poke_mode_only` class decorator for sensors that aren't safe in reschedule mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (AIRFLOW-5568) Add Hook / Operators for GCP Healthcare API
Jacob Ferriero created AIRFLOW-5568: --- Summary: Add Hook / Operators for GCP Healthcare API Key: AIRFLOW-5568 URL: https://issues.apache.org/jira/browse/AIRFLOW-5568 Project: Apache Airflow Issue Type: New Feature Components: hooks, operators Affects Versions: 1.10.5 Reporter: Jacob Ferriero It'd be useful to have a hook for the healthcare api and some operators / sensor for the long running operations (https://cloud.google.com/healthcare/docs/how-tos/long-running-operations) * import / export of various formats * deidentification of datasets [https://cloud.google.com/healthcare/docs/apis] Note this would be a good candidate to illustrate some sort of AysncOperator described in AIRFLOW-5567 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (AIRFLOW-5567) Improved primitive for building Operators that benefit from reschedule mode
Jacob Ferriero created AIRFLOW-5567: --- Summary: Improved primitive for building Operators that benefit from reschedule mode Key: AIRFLOW-5567 URL: https://issues.apache.org/jira/browse/AIRFLOW-5567 Project: Apache Airflow Issue Type: Improvement Components: models, operators Affects Versions: 1.10.5 Reporter: Jacob Ferriero Assignee: Jacob Ferriero Often times airflow operators (derived from BaseOperator) kick-off a long running tasks and then waits / polls, blocking a worker slot until the long running task completes. This can be problematic in environments with many long running tasks. BaseSensorOperator was improved by implementing `reschedule` mode to solve the similar issue with long running sensors blocking a worker to poll for a long time. This issue is to track how we could provide a primitive that would make it easy to develop operators for long running tasks that reschedule a `poll` operation rather than blocking in their `execute` method. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (AIRFLOW-5520) DataflowPythonOperator dependency management requires side effects
Jacob Ferriero created AIRFLOW-5520: --- Summary: DataflowPythonOperator dependency management requires side effects Key: AIRFLOW-5520 URL: https://issues.apache.org/jira/browse/AIRFLOW-5520 Project: Apache Airflow Issue Type: Improvement Components: gcp Affects Versions: 1.10.2 Reporter: Jacob Ferriero When using DataflowPythonOperator it is difficult to manage apache beam version, (and other python dependencies) without affecting your entire airflow environment. It seems the Dataflow hook just submits a subprocess and python The operator / hook should be improved to isolate python dependencies for running run py_file. Perhaps this could be achieved in a virtual environment (similar to PythonVirtualEnvOperator). For beam it's often customary to specify a --requirements_file or --setup_file to manage python dependencies, we could run one of these in the venv to get it setup. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (AIRFLOW-4983) DataflowPythonOperator should be able to submit pipelines with python3
Jacob Ferriero created AIRFLOW-4983: --- Summary: DataflowPythonOperator should be able to submit pipelines with python3 Key: AIRFLOW-4983 URL: https://issues.apache.org/jira/browse/AIRFLOW-4983 Project: Apache Airflow Issue Type: Improvement Components: gcp, hooks, operators Affects Versions: 1.10.2, 1.10.4, 2.0.0, 1.10.5 Reporter: Jacob Ferriero Assignee: Jacob Ferriero Currently the DataflowHook hard codes python2 interpreter. Apache Beam is beginning to support python3 interpreter and we should support submitting those pipelines. I've we should add a `py_interpreter` arg to the operator and hook that defaults to 'python2' (to not be interface breaking. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (AIRFLOW-4397) Add GCSUploadSessionCompleteSensor
Jacob Ferriero created AIRFLOW-4397: --- Summary: Add GCSUploadSessionCompleteSensor Key: AIRFLOW-4397 URL: https://issues.apache.org/jira/browse/AIRFLOW-4397 Project: Apache Airflow Issue Type: New Feature Components: contrib Reporter: Jacob Ferriero Assignee: Jacob Ferriero I'd like to contribute a Sensor for Google Cloud Storage that can poke a bucket until there has been sufficient time without a new file drop. Often times there are cases where a third party vendor drops data to a bucket but don't send a success flag when they are done. This sensor would allow you to poke every n minutes to check if more files have been added since the last poke, and if there had been `inactivity_period` minutes without a new file drop, return `True`. This could allow SLA misses if data did not arrive by an expected time, and have a configurable deadline past which the sensor would fail. Optionally the user could specify a minimum number of files for the sensor to succeed. This would be my first time contributing to an OSS project, so please let me know if this is not the appropriate place to start. -- This message was sent by Atlassian JIRA (v7.6.3#76005)