[jira] [Commented] (AIRFLOW-3905) Allow using parameters for sql statement in SqlSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552690#comment-17552690 ] ASF GitHub Bot commented on AIRFLOW-3905: - malthe commented on PR #4723: URL: https://github.com/apache/airflow/pull/4723#issuecomment-1152249671 Shouldn't the `parameters` be templated? > Allow using parameters for sql statement in SqlSensor > - > > Key: AIRFLOW-3905 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3905 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Affects Versions: 1.10.2 >Reporter: Xiaodong Deng >Assignee: Xiaodong Deng >Priority: Minor > Fix For: 1.10.3 > > > In most SQL-related operators/sensors, argument `parameters` is available to > help render SQL command conveniently. But this is not available in SqlSensor > yet. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17540543#comment-17540543 ] ASF GitHub Bot commented on AIRFLOW-5071: - potiuk commented on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1133832625 > > @turbaszek Let me make a PR later~ We are doing pressure tests these days and this problem had appeared often. > > Hey turbaszek, Any chance to have PR submitted, we are experiencing in 2.3.0 as well. I think you wanted to call @ghostbody who wanted to submi the fix @vanducng . > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17540527#comment-17540527 ] ASF GitHub Bot commented on AIRFLOW-5071: - vanducng commented on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1133817633 > @turbaszek Let me make a PR later~ We are doing pressure tests these days and this problem had appeared often. Hey turbaszek, Any chance to have PR submitted, we are experiencing in 2.3.0 as well. > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (AIRFLOW-778) Metastore Partition Sensor Broken
[ https://issues.apache.org/jira/browse/AIRFLOW-778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528879#comment-17528879 ] ASF GitHub Bot commented on AIRFLOW-778: deprosun commented on PR #2005: URL: https://github.com/apache/airflow/pull/2005#issuecomment-209271 Has to be ```python MetastorePartitionSensor( dag=dag, task_id="t", schema="some_schema", table="some_table", partition_name=( "visit_date=some_date" # without quotes ), sql="", # have to provide an empty string due to inheritance issue # conn_id and mysql_conn_id have to be duplicated conn_id="hive_conn", mysql_conn_id="hive_conn", soft_fail=True, mode="reschedule", poke_interval=60, timeout=300, ) ``` > Metastore Partition Sensor Broken > - > > Key: AIRFLOW-778 > URL: https://issues.apache.org/jira/browse/AIRFLOW-778 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Reporter: Dan Davydov >Assignee: Dan Davydov >Priority: Blocker > > MetastorePartitionSensor always throws an exception on initialization due to > 72cc8b3006576153aa30d27643807b4ae5dfb593 . > Looks like the tests for this are only run if an explicit flag is set which > is how this got past CI. > cc [~xuanji] -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (AIRFLOW-778) Metastore Partition Sensor Broken
[ https://issues.apache.org/jira/browse/AIRFLOW-778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527795#comment-17527795 ] ASF GitHub Bot commented on AIRFLOW-778: deprosun commented on PR #2005: URL: https://github.com/apache/airflow/pull/2005#issuecomment-1109120353 I got this error when I tried to use the `MetastorePartitionSensor` ``` File "/home/airflow/.local/lib/python3.9/site-packages/airflow/sensors/sql.py", line 72, in _get_hook conn = BaseHook.get_connection(self.conn_id) File "/home/airflow/.local/lib/python3.9/site-packages/airflow/hooks/base.py", line 68, in get_connection conn = Connection.get_connection_from_secrets(conn_id) File "/home/airflow/.local/lib/python3.9/site-packages/airflow/models/connection.py", line 410, in get_connection_from_secrets raise AirflowNotFoundException(f"The conn_id `{conn_id}` isn't defined") airflow.exceptions.AirflowNotFoundException: The conn_id `` isn't defined ``` What is the right way of using this Sensor? I am not sure why I am getting the above exception. This is how I am using it ```python MetastorePartitionSensor( dag=dag, task_id="t", schema="some_schema", table="some_table", partition_name=( "visit_date='some_date'" ), mysql_conn_id="hive_prod_conn", soft_fail=True, mode="reschedule", poke_interval=60, timeout=300, ) ``` > Metastore Partition Sensor Broken > - > > Key: AIRFLOW-778 > URL: https://issues.apache.org/jira/browse/AIRFLOW-778 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Reporter: Dan Davydov >Assignee: Dan Davydov >Priority: Blocker > > MetastorePartitionSensor always throws an exception on initialization due to > 72cc8b3006576153aa30d27643807b4ae5dfb593 . > Looks like the tests for this are only run if an explicit flag is set which > is how this got past CI. > cc [~xuanji] -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (AIRFLOW-3537) Allow AWS ECS Operator to use templates in task_definition parameter
[ https://issues.apache.org/jira/browse/AIRFLOW-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523163#comment-17523163 ] ASF GitHub Bot commented on AIRFLOW-3537: - potiuk commented on PR #4341: URL: https://github.com/apache/airflow/pull/4341#issuecomment-1100727118 You take a look at other PRs people added with template fields. You will find a number of those If you search. > Allow AWS ECS Operator to use templates in task_definition parameter > > > Key: AIRFLOW-3537 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3537 > Project: Apache Airflow > Issue Type: Wish > Components: aws >Reporter: tomoya tabata >Assignee: tomoya tabata >Priority: Minor > > The AWS ECS operator does not currently apply templates to the > task_definition parameter. > I'd like to allow AWS ECS Operator to use templates in task_definition > parameter. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-3537) Allow AWS ECS Operator to use templates in task_definition parameter
[ https://issues.apache.org/jira/browse/AIRFLOW-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523162#comment-17523162 ] ASF GitHub Bot commented on AIRFLOW-3537: - potiuk commented on PR #4341: URL: https://github.com/apache/airflow/pull/4341#issuecomment-1100726790 Just add the field. It's enough. > Allow AWS ECS Operator to use templates in task_definition parameter > > > Key: AIRFLOW-3537 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3537 > Project: Apache Airflow > Issue Type: Wish > Components: aws >Reporter: tomoya tabata >Assignee: tomoya tabata >Priority: Minor > > The AWS ECS operator does not currently apply templates to the > task_definition parameter. > I'd like to allow AWS ECS Operator to use templates in task_definition > parameter. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-3537) Allow AWS ECS Operator to use templates in task_definition parameter
[ https://issues.apache.org/jira/browse/AIRFLOW-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523158#comment-17523158 ] ASF GitHub Bot commented on AIRFLOW-3537: - FridayPush commented on PR #4341: URL: https://github.com/apache/airflow/pull/4341#issuecomment-1100718236 I appreciate the invitation @potiuk, generally this PR is what I would produce to make the change. I would likely have removed the `template_fields` test as it is a value check. Additionally the [BaseOperator already has tests](https://github.com/apache/airflow/blob/main/tests/models/test_baseoperator.py) that validate the behavior of changes to the `template_fields` Per the original PR review the only template rendering test I can find in other [AWS Operators is the 'batch' operator](https://github.com/apache/airflow/blob/main/tests/providers/amazon/aws/operators/test_batch.py#L100). If I was to submit a new PR with the changes to the template_fields how should tests be handled? Would you lean towards removing the test or adding the per field test referenced by mik-laj? > Allow AWS ECS Operator to use templates in task_definition parameter > > > Key: AIRFLOW-3537 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3537 > Project: Apache Airflow > Issue Type: Wish > Components: aws >Reporter: tomoya tabata >Assignee: tomoya tabata >Priority: Minor > > The AWS ECS operator does not currently apply templates to the > task_definition parameter. > I'd like to allow AWS ECS Operator to use templates in task_definition > parameter. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-3537) Allow AWS ECS Operator to use templates in task_definition parameter
[ https://issues.apache.org/jira/browse/AIRFLOW-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523038#comment-17523038 ] ASF GitHub Bot commented on AIRFLOW-3537: - potiuk commented on PR #4341: URL: https://github.com/apache/airflow/pull/4341#issuecomment-1100598087 > Wish this had been merged. A pain point for us that's one field name added to a tuple. Feel free to contribute it as a PR. This is an open-source project with more than 2000 contributors - you are absolutely welcome to contribute such change. > Allow AWS ECS Operator to use templates in task_definition parameter > > > Key: AIRFLOW-3537 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3537 > Project: Apache Airflow > Issue Type: Wish > Components: aws >Reporter: tomoya tabata >Assignee: tomoya tabata >Priority: Minor > > The AWS ECS operator does not currently apply templates to the > task_definition parameter. > I'd like to allow AWS ECS Operator to use templates in task_definition > parameter. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-3537) Allow AWS ECS Operator to use templates in task_definition parameter
[ https://issues.apache.org/jira/browse/AIRFLOW-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523004#comment-17523004 ] ASF GitHub Bot commented on AIRFLOW-3537: - FridayPush commented on PR #4341: URL: https://github.com/apache/airflow/pull/4341#issuecomment-1100510274 Wish this had been merged. A pain point for us that's one field name added to a tuple. > Allow AWS ECS Operator to use templates in task_definition parameter > > > Key: AIRFLOW-3537 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3537 > Project: Apache Airflow > Issue Type: Wish > Components: aws >Reporter: tomoya tabata >Assignee: tomoya tabata >Priority: Minor > > The AWS ECS operator does not currently apply templates to the > task_definition parameter. > I'd like to allow AWS ECS Operator to use templates in task_definition > parameter. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17520659#comment-17520659 ] ASF GitHub Bot commented on AIRFLOW-5071: - woodywuuu commented on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1095231809 airflow: 2.2.2 with mysql8、 HA scheduler、celery executor(redis backend) From logs, it show that those ti reported this error `killed externally (status: success)` , were rescheduled! 1. scheduler found a ti to scheduled (ti from None to scheduled) 2. scheduler queued ti(ti from scheduled to queued) 3. scheduler send ti to celery 4. worker get ti 5. worker found ti‘s state in mysql is scheduled https://github.com/apache/airflow/blob/2.2.2/airflow/models/taskinstance.py#L1224 6. worker set this ti to None 7. scheduler reschedule this ti 8. scheduler could not queue this ti again, and found this ti success(in celery), so set it to failed From mysql we get that: all failed task has no external_executor_id! We use 5000 dags, each with 50 dummy task, found that, if the following two conditions are met,the probability of triggering this problem will highly increase: 1. no external_executor_id was set to queued ti in celery https://github.com/apache/airflow/blob/2.2.2/airflow/jobs/scheduler_job.py#L537 * This sql above has skip_locked, and some queued ti in celery may miss this external_executor_id. 10. a scheduler loop cost very long(more than 60s), `adopt_or_reset_orphaned_tasks` judge that schedulerJob failed, and try adopt orphaned ti https://github.com/apache/airflow/blob/9ac742885ffb83c15f7e3dc910b0cf9df073407a/airflow/executors/celery_executor.py#L442 We do these tests: 1. patch `SchedulerJob. _process_executor_events `, not to set external_executor_id to those queued ti * 300+ dag failed with `killed externally (status: success)` normally less than 10 2. patch `adopt_or_reset_orphaned_tasks`, not to adopt orphaned ti * no dag failed ! I read the notes [below](https://github.com/apache/airflow/blob/9ac742885ffb83c15f7e3dc910b0cf9df073407a/airflow/executors/celery_executor.py#L442) , but still don't understand this problems: 1. why should we handle queued ti in celery and set this external id ? > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17507936#comment-17507936 ] ASF GitHub Bot commented on AIRFLOW-5071: - kenny813x201 commented on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1069820469 We also got the same error message. In our case, it turns out that we are using the same name for different dags. Changing different dags from `as dag` to like `as dags1` and `as dags2` solve the issue for us. ``` with DAG( "dag_name", ) as dag: ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-1746) Add a Nomad operator to trigger job from Airflow
[ https://issues.apache.org/jira/browse/AIRFLOW-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17506680#comment-17506680 ] ASF GitHub Bot commented on AIRFLOW-1746: - shantanugadgil commented on pull request #2708: URL: https://github.com/apache/airflow/pull/2708#issuecomment-1067538261 Is this still planned to be merged? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add a Nomad operator to trigger job from Airflow > > > Key: AIRFLOW-1746 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1746 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib >Reporter: Eyal Trabelsi >Assignee: Eyal Trabelsi >Priority: Major > > We recently face the need to trigger nomad jobs from Airflow and no operator > are available for that. > The way the operator works is to register a nomad job and dispatch the job , > than check the status of the job using similar method like boto-core > (https://github.com/boto/botocore/blob/5a07b477114b11e6dc5f676f5db810972565b113/botocore/docs/waiter.py) > The operator uses https://github.com/jrxFive/python-nomad which is a wrap > over nomad rest api of nomad written in python. > Link to the PR : https://github.com/apache/incubator-airflow/pull/2708 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17506060#comment-17506060 ] ASF GitHub Bot commented on AIRFLOW-5071: - aakashanand92 edited a comment on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1066506022 > We face the same issue with tasks that stay indefinitely in a queued status, except that we don't see tasks as `up_for_retry`. It happens randomly within our DAGs. The task will stay in a queued status forever until we manually make it fail. We **don't use any sensors** at all. We are on an AWS MWAA instance (Airflow 2.0.2). > > Example logs: Scheduler: > > ``` > [2022-01-14 08:03:32,868] {{scheduler_job.py:1239}} ERROR - Executor reports task instance finished (failed) although the task says its queued. (Info: None) Was the task killed externally? > [2022-01-14 08:03:32,845] {{scheduler_job.py:1210}} INFO - Executor reports execution of task0 execution_date=2022-01-13 07:00:00+00:00 exited with status failed for try_number 1 > in state FAILURE > ``` > > Worker: > > ``` > [2021-04-20 20:54:29,109: ERROR/ForkPoolWorker-15] Failed to execute task dag_id could not be found: task0. Either the dag did not exist or it failed to parse..` > This is not seen in the worker logs for every occurrence in the scheduler logs. > ``` > > Because of the MWAA autoscaling mechanism, `worker_concurrency` is not configurable. `worker_autoscale`: `10, 10`. `dagbag_import_timeout`: 120s `dag_file_processor_timeout`: 50s `parallelism` = 48 `dag_concurrency` = 1 `max_threads` = 8 > > We currently have 2 (minWorkers) to 10 (maxWorkers) mw1.medium (2 vCPU) workers. @val2k Did you find a solution for this ? I am also using MWAA environment and facing the same issue. The tasks get stuck in queued state and when I look at the scheduler logs I can see the same error. "Executor reports task instance %s finished (%s) although the task says its %s. (Info: %s) Was the task killed externally?" I tried everything I can find in this thread but nothing seems to be working. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17506059#comment-17506059 ] ASF GitHub Bot commented on AIRFLOW-5071: - aakashanand92 edited a comment on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1066506022 > We face the same issue with tasks that stay indefinitely in a queued status, except that we don't see tasks as `up_for_retry`. It happens randomly within our DAGs. The task will stay in a queued status forever until we manually make it fail. We **don't use any sensors** at all. We are on an AWS MWAA instance (Airflow 2.0.2). > > Example logs: Scheduler: > > ``` > [2022-01-14 08:03:32,868] {{scheduler_job.py:1239}} ERROR - Executor reports task instance finished (failed) although the task says its queued. (Info: None) Was the task killed externally? > [2022-01-14 08:03:32,845] {{scheduler_job.py:1210}} INFO - Executor reports execution of task0 execution_date=2022-01-13 07:00:00+00:00 exited with status failed for try_number 1 > in state FAILURE > ``` > > Worker: > > ``` > [2021-04-20 20:54:29,109: ERROR/ForkPoolWorker-15] Failed to execute task dag_id could not be found: task0. Either the dag did not exist or it failed to parse..` > This is not seen in the worker logs for every occurrence in the scheduler logs. > ``` > > Because of the MWAA autoscaling mechanism, `worker_concurrency` is not configurable. `worker_autoscale`: `10, 10`. `dagbag_import_timeout`: 120s `dag_file_processor_timeout`: 50s `parallelism` = 48 `dag_concurrency` = 1 `max_threads` = 8 > > We currently have 2 (minWorkers) to 10 (maxWorkers) mw1.medium (2 vCPU) workers. Did you find a solution for this ? I am also using MWAA environment and facing the same issue. The tasks get stuck in queued state and when I look at the scheduler logs I can see the same error. "Executor reports task instance %s finished (%s) although the task says its %s. (Info: %s) Was the task killed externally?" I tried everything I can find in this thread but nothing seems to be working. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17506058#comment-17506058 ] ASF GitHub Bot commented on AIRFLOW-5071: - aakashanand92 commented on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1066506022 > We face the same issue with tasks that stay indefinitely in a queued status, except that we don't see tasks as `up_for_retry`. It happens randomly within our DAGs. The task will stay in a queued status forever until we manually make it fail. We **don't use any sensors** at all. We are on an AWS MWAA instance (Airflow 2.0.2). > > Example logs: Scheduler: > > ``` > [2022-01-14 08:03:32,868] {{scheduler_job.py:1239}} ERROR - Executor reports task instance finished (failed) although the task says its queued. (Info: None) Was the task killed externally? > [2022-01-14 08:03:32,845] {{scheduler_job.py:1210}} INFO - Executor reports execution of task0 execution_date=2022-01-13 07:00:00+00:00 exited with status failed for try_number 1 > in state FAILURE > ``` > > Worker: > > ``` > [2021-04-20 20:54:29,109: ERROR/ForkPoolWorker-15] Failed to execute task dag_id could not be found: task0. Either the dag did not exist or it failed to parse..` > This is not seen in the worker logs for every occurrence in the scheduler logs. > ``` > > Because of the MWAA autoscaling mechanism, `worker_concurrency` is not configurable. `worker_autoscale`: `10, 10`. `dagbag_import_timeout`: 120s `dag_file_processor_timeout`: 50s `parallelism` = 48 `dag_concurrency` = 1 `max_threads` = 8 > > We currently have 2 (minWorkers) to 10 (maxWorkers) mw1.medium (2 vCPU) workers. Did you find a solution for this ? I am also using MWAA environment and facing the same issue. The tasks get stuck in queued state and when I look at the scheduler logs I can see the same error. "Executor reports task instance %s finished (%s) although the task says its %s. (Info: %s) Was the task killed externally?" I tried to everything I can find in this thread but nothing seems to be working. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17506007#comment-17506007 ] ASF GitHub Bot commented on AIRFLOW-5071: - ghostbody commented on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1066406484 @turbaszek Let me make a PR later~ We are doing pressure tests these days and this problem had appeared often. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505135#comment-17505135 ] ASF GitHub Bot commented on AIRFLOW-5071: - turbaszek commented on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1065613233 @ghostbody do you have idea how this can be addressed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501136#comment-17501136 ] ASF GitHub Bot commented on AIRFLOW-5071: - ghostbody commented on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1058801920 After STRUGLING, We found a method to 100% reproduce this issue !!! tl;dr https://github.com/apache/airflow/blob/9ac742885ffb83c15f7e3dc910b0cf9df073407a/airflow/models/taskinstance.py#L1253 Add a `raise` to simulate db error which will likely happen when the DB is under great pressure. Then you will get this issue `Was the task killed externally` in all the time. Conditions: - Airflow 2.2 - Celery Executor It's becasue the worker use a local task job which will spwan a child process to execute the job. The parent process set the task from `Queued` to `Running` State. However, when the prepare work for the parent process failed, it will lead to this error directly. related code is here: https://github.com/apache/airflow/blob/2.2.2/airflow/jobs/local_task_job.py#L89 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-6405) Bigquery Update Table Properties Operator
[ https://issues.apache.org/jira/browse/AIRFLOW-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500879#comment-17500879 ] ASF GitHub Bot commented on AIRFLOW-6405: - rolldeep commented on pull request #7126: URL: https://github.com/apache/airflow/pull/7126#issuecomment-1058249946 Hi @jithin-sukumar , thanks for your operator! Unfortunately, I can't figure out how to use it. Currently I want to upsert delta table into main table. There is no parameters for the destination (main) table. Here is my operator: ```python upsert_table = BigQueryUpsertTableOperator( task_id=f"upsert_table", dataset_id='DATASET_NAME', table_resource={ "tableReference": {"tableId": f" {config.get('TABLE_NAME')}"}, "expirationTime": (int(time.time()) + 300) * 1000, }, ) ``` The problem is that I can't choose the destination (main) table. @jithin-sukumar can you explain how can I set up my destination table? As I see, current implementation uses tableReference for both source and destination. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Bigquery Update Table Properties Operator > - > > Key: AIRFLOW-6405 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6405 > Project: Apache Airflow > Issue Type: New Feature > Components: gcp, operators >Affects Versions: 1.10.7 >Reporter: Jithin Sukumar >Assignee: Jithin Sukumar >Priority: Minor > Fix For: 2.0.0 > > > Currently, Airflow doesn't seem to support BigQuery update table operations > [1] (specifically to update the properties of a table). Is this already under > development? > (The use case is conditionally updating the `expiration time` of BQ tables.) > > References: > [1]: [https://cloud.google.com/bigquery/docs/managing-tables] > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495037#comment-17495037 ] ASF GitHub Bot commented on AIRFLOW-5071: - omoumniabdou commented on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1046086644 The problem for us was that we had one dag that reach 32 parallelize runnable task ( 32 leaf tasks) which was the value of parameter `parallelism`. After this, the scheduler was not able to run (or queue) any task. Increasing this parameter solve the problem for us. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17494898#comment-17494898 ] ASF GitHub Bot commented on AIRFLOW-5071: - ghostbody commented on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1045871873 @pbotros No, we do not solve this problem yet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17493003#comment-17493003 ] ASF GitHub Bot commented on AIRFLOW-5071: - pbotros commented on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1041101221 We also run into this fairly often, despite not using any sensors. We only seemed to start getting this error once we changed our Airflow database to be in the cloud (AWS RDB); our Airflow webserver & scheduler runs on desktop workstations on-premises. As others have suggested in this thread, this is a very annoying problem that requires manual intervention. @ghostbody any progress on determining if that's the correct root cause? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-6602) Make "executor_config" templated field to support dynamic parameters for kubernetes executor
[ https://issues.apache.org/jira/browse/AIRFLOW-6602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489071#comment-17489071 ] ASF GitHub Bot commented on AIRFLOW-6602: - BayanAzima commented on pull request #7230: URL: https://github.com/apache/airflow/pull/7230#issuecomment-1032967030 I'd like to see this feature as I'd like to template out a `sub_path` of a PV I have so that i don't expose other dag run data I store there to other dag runs. see the example below I notice that this issue is pretty old, was this added in v2 or is there another way I can do this now? ``` KUBERNETES_WORKSPACE_PVC = { "pod_override": k8s.V1Pod( spec=k8s.V1PodSpec( containers=[ k8s.V1Container( name="base", volume_mounts=[ k8s.V1VolumeMount( name=Constants.KUBERNETES_WORKSPACE_PVC, mount_path='/opt/airflow/mnt/workspace', # sub_path="{{ ti.xcom_pull(key='run_group_id')}}", read_only=False ) ], ) ], volumes=[ k8s.V1Volume( name=Constants.KUBERNETES_WORKSPACE_PVC, persistent_volume_claim=k8s.V1PersistentVolumeClaimVolumeSource(claim_name=Constants.KUBERNETE_WORKSPACE_PVC) ) ], ) ), } -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Make "executor_config" templated field to support dynamic parameters for > kubernetes executor > > > Key: AIRFLOW-6602 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6602 > Project: Apache Airflow > Issue Type: New Feature > Components: executor-kubernetes, executors >Affects Versions: 1.10.7 >Reporter: Jun Xie >Assignee: Jun Xie >Priority: Major > > When running airflow with Kubernetes Executor, one specifies the > configurations through > "executor_config". At the moment, this field is not templated, meaning that > we won't be able to have dynamic parameters. So I did an experiment that I > created MyPythonOperator which inherits PythonOperator but with with > "executor_config" added to template_fields. However, the result shows that > this change itself isn't enough, because airflow first creates a Pod based on > executor_config without rendering it, and then run the task inside the pod > (the running will trigger the Jinja template rendering) > See an example config below showing a use case where one can mount dynamic > "subPath" to the image > > {code:java} > executor_config = { > "KubernetesExecutor": { > "image": "some_image", > "request_memory": "2Gi", > 'request_cpu': '1', > "volumes": [ > { > "name": "data", > "persistentVolumeClaim": {"claimName": "some_claim_name"}, > }, > ], > "volume_mounts": [ > { > "mountPath": "/code", > "name": "data", > "subPath": "/code/{{ dag_run.conf['branch_name'] }}" > }, > ] > } > } > {code} > > > > I have then did a further experiment that in > trigger_tasks() from airflow/executors/base_executor.py, right before > execute_async() is called, I called simple_ti.render_templates() which will > trigger the rendering, so the kubernetes_executor.execute_async() will pick > up the resolved parameters > > {code:java} > # current behavior > for i in range(min((open_slots, len(self.queued_tasks: > key, (command, _, queue, simple_ti) = sorted_queue.pop(0) > self.queued_tasks.pop(key) > self.running[key] = command > self.execute_async(key=key, >command=command, >queue=queue, >executor_config=simple_ti.executor_config) > {code} > > > {code:java} > # Proposed new behavior: > for i in range(min((open_slots, len(self.queued_tasks: > key, (command, _, queue, simple_ti) = sorted_queue.pop(0) > self.queued_tasks.pop(key) > self.running[key] = command > simple_ti.render_templates() # render it > self.execute_async(key=key, >command=command, >queue=queue, >
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476036#comment-17476036 ] ASF GitHub Bot commented on AIRFLOW-5071: - val2k edited a comment on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1012934214 We face the same issue with tasks that stay indefinitely in a queued status, except that we don't see tasks as `up_for_retry`. It happens randomly within our DAGs. The task will stay in a queued status forever until we manually make it fail. We **don't use any sensors** at all. We are on an AWS MWAA instance (Airflow 2.0.2). Example logs: Scheduler: ``` [2022-01-14 08:03:32,868] {{scheduler_job.py:1239}} ERROR - Executor reports task instance finished (failed) although the task says its queued. (Info: None) Was the task killed externally? [2022-01-14 08:03:32,845] {{scheduler_job.py:1210}} INFO - Executor reports execution of task0 execution_date=2022-01-13 07:00:00+00:00 exited with status failed for try_number 1 in state FAILURE ``` Worker: ``` [2021-04-20 20:54:29,109: ERROR/ForkPoolWorker-15] Failed to execute task dag_id could not be found: task0. Either the dag did not exist or it failed to parse..` This is not seen in the worker logs for every occurrence in the scheduler logs. ``` Because of the MWAA autoscaling mechanism, `worker_concurrency` is not configurable. `worker_autoscale`: `10, 10`. `dagbag_import_timeout`: 120s `dag_file_processor_timeout`: 50s `parallelism` = 48 `dag_concurrency` = 1 `max_threads` = 8 We currently have 2 (minWorkers) to 10 (maxWorkers) mw1.medium (2 vCPU) workers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476035#comment-17476035 ] ASF GitHub Bot commented on AIRFLOW-5071: - val2k commented on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1012934214 We face the same issue with tasks that stay indefinitely in a queued status, except that we don't see tasks as `up_for_retry`. It happens randomly within our DAGs. The task will stay in a queued status forever until we manually make it fail. We **don't use any sensors** at all. We are on an AWS MWAA instance (Airflow 2.0.2). Example logs: Scheduler: ``` [2022-01-14 08:03:32,868] {{scheduler_job.py:1239}} ERROR - Executor reports task instance finished (failed) although the task says its queued. (Info: None) Was the task killed externally? [2022-01-14 08:03:32,845] {{scheduler_job.py:1210}} INFO - Executor reports execution of task0 execution_date=2022-01-13 07:00:00+00:00 exited with status failed for try_number 1 in state FAILURE ``` Worker: `[2021-04-20 20:54:29,109: ERROR/ForkPoolWorker-15] Failed to execute task dag_id could not be found: task0. Either the dag did not exist or it failed to parse..` This is not seen in the worker logs for every occurrence in the scheduler logs. Because of the MWAA autoscaling mechanism, `worker_concurrency` is not configurable. `worker_autoscale`: `10, 10`. `dagbag_import_timeout`: 120s `dag_file_processor_timeout`: 50s `parallelism` = 48 `dag_concurrency` = 1 `max_threads` = 8 We currently have 2 (minWorkers) to 10 (maxWorkers) mw1.medium (2 vCPU) workers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475121#comment-17475121 ] ASF GitHub Bot commented on AIRFLOW-5071: - ghostbody edited a comment on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804 we reviewed the code and found that in `local_task_job.py`, the parent process has a `heatbeat_callback`, and will check the state and child process return code of the `task_instance`. However, theses lines may cover a bug? ![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png) ![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png) **The raw task command write back the taskintance's state(like sucess) doesn't mean the child process is finished(returned)?** So, in this heatbeat callback, there maybe a race condition when task state is filled back while the child process is not returned. In this senario, the local task will kill the child process by mistake. And then, the scheduler will checkout this and report "task instance X finished (success) although the task says its queued. Was the task killed externally?" this is a simple schematic diagram: ![image](https://user-images.githubusercontent.com/8371330/149273573-45700f32-079b-4b22-8dba-d6a1ce37a243.png) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475117#comment-17475117 ] ASF GitHub Bot commented on AIRFLOW-5071: - ghostbody edited a comment on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804 we reviewed the code and found that in `local_task_job.py`, the parent process has a `heatbeat_callback`, and will check the state and child process return code of the `task_instance`. However, theses lines may cover a bug? ![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png) ![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png) **The raw task command write back the taskintance's state(like sucess) doesn't mean the child process is finished(returned)?** So, in this heatbeat callback, there maybe a race condition when task state is filled back while the child process is not returned. In this senario, the local task will kill the child process by mistake. And then, the scheduler will checkout this and report "task instance X finished (success) although the task says its queued. Was the task killed externally?" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475116#comment-17475116 ] ASF GitHub Bot commented on AIRFLOW-5071: - ghostbody edited a comment on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804 we reviewed the code and found that in `local_task_job.py`, the parent process has a `heatbeat_callback`, and will check the state and child process return code of the `task_instance`. However, theses lines may cover a bug? ![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png) ![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png) **The raw task command write back the taskintance's state(like sucess) doesn't mean the child process is finished(returned)?** So, in this heatbeat callback, there maybe a race condition when task state is filled back while the child process is not returned. In this senario, the local task will kill the child process by mistake. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475115#comment-17475115 ] ASF GitHub Bot commented on AIRFLOW-5071: - ghostbody commented on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804 we reviewed the code and found that in `local_task_job.py`, the parent process has a `heatbeat_callback`, and will check the state and child process return code of the `task_instance`. However, theses lines may cover a bug? ![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png) ![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png) **The raw task command write back the taskintance's state(like sucess) doesn't mean the child process is finished(returned)?** So, in this heatbeat callback, there maybe a race condition when task state is filled back while the child process is not returned. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17467689#comment-17467689 ] ASF GitHub Bot commented on AIRFLOW-4922: - derkuci commented on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-1003763219 @xuemengran nvm. I guess you meant the PR suggested here. I tried that; the log table has changed, and I couldn't match the information easily. The `execution_date` column is None when `event="cli_task_run"`, which makes filtering impossible. I understand why the PR was rejected. For cases where the logs exist but the web UI couldn't locate the correct hostname, the issue is that the "task_instance" table only stores the latest `try_number`/`hostname` for a task run (as already indicated by @ITriangle). The PK doesn't include `try_number`. It's better to fix the task_instance table, which is more fundamental, and probably would intimidate most "amateurs" (like me). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17467682#comment-17467682 ] ASF GitHub Bot commented on AIRFLOW-4922: - derkuci commented on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-1003754230 > In the 2.1.1 version, I tried to modify the airflow/utils/log/file_task_handler.py file to obtain the hostname information by reading the log table. I confirmed through debug that I could get the host information in this way, @xuemengran could you kindly point to how this could be done? With airflow 2.2.2 + Celery, I am seeing error messages like below due to `TaskInstance.hostname` being always the latest and not relying on the `try_number`. ``` "Failed to fetch log file from worker. Client error '404 NOT FOUND' for url ..." ``` If we try really hard, the logs can be found from the local storage of _some_ celery workers. But that is a huge burden for operational and/or debugging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454196#comment-17454196 ] ASF GitHub Bot commented on AIRFLOW-6786: - potiuk commented on pull request #12388: URL: https://github.com/apache/airflow/pull/12388#issuecomment-987086352 > @potiuk I have a same requirement. If I am able to implement it. Will raise a PR. Cool! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor > > > Key: AIRFLOW-6786 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6786 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, hooks >Affects Versions: 1.10.9 >Reporter: Daniel Ferguson >Assignee: Daniel Ferguson >Priority: Minor > > Add the KafkaProducerHook. > Add the KafkaConsumerHook. > Add the KafkaSensor which listens to messages with a specific topic. > Related Issue: > #1311 (Pre-dates Jira Migration) > Reminder to contributors: > You must add an Apache License header to all new files > Please squash your commits when possible and follow the 7 rules of good Git > commits > I am new to the community, I am not sure the files are at the right place or > missing anything. > The sensor could be used as the first node of a dag where the second node can > be a TriggerDagRunOperator. The messages are polled in a batch and the dag > runs are dynamically generated. > Thanks! > Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], > it is important to mention these integrations are not suitable for > low-latency/high-throughput/streaming. For reference, [#1415 > (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806]. > Co-authored-by: Dan Ferguson > [dferguson...@gmail.com|mailto:dferguson...@gmail.com] > Co-authored-by: YuanfΞi Zhu -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454186#comment-17454186 ] ASF GitHub Bot commented on AIRFLOW-6786: - debashis-das commented on pull request #12388: URL: https://github.com/apache/airflow/pull/12388#issuecomment-987056630 @potiuk I have a same requirement. If I am able to implement it. Will raise a PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor > > > Key: AIRFLOW-6786 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6786 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, hooks >Affects Versions: 1.10.9 >Reporter: Daniel Ferguson >Assignee: Daniel Ferguson >Priority: Minor > > Add the KafkaProducerHook. > Add the KafkaConsumerHook. > Add the KafkaSensor which listens to messages with a specific topic. > Related Issue: > #1311 (Pre-dates Jira Migration) > Reminder to contributors: > You must add an Apache License header to all new files > Please squash your commits when possible and follow the 7 rules of good Git > commits > I am new to the community, I am not sure the files are at the right place or > missing anything. > The sensor could be used as the first node of a dag where the second node can > be a TriggerDagRunOperator. The messages are polled in a batch and the dag > runs are dynamically generated. > Thanks! > Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], > it is important to mention these integrations are not suitable for > low-latency/high-throughput/streaming. For reference, [#1415 > (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806]. > Co-authored-by: Dan Ferguson > [dferguson...@gmail.com|mailto:dferguson...@gmail.com] > Co-authored-by: YuanfΞi Zhu -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453706#comment-17453706 ] ASF GitHub Bot commented on AIRFLOW-4922: - potiuk edited a comment on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-986311760 Well. @blotouta2. I think this issue discusses 3 or 4 different problems. So your comment is pretty meaningless. Also the issue is clsoed so it's likely it's a different issue altogether. If you REALLLY want to get help, just open a new issue and provide all the details you can (and ideally a reproducible case). Or if you do not have reproducible case - provide as much information as you can and open a Github Discussion And BTW. if you are using an older version of Airflow, just upgrade to the newest and check - Airlfow follows SemVer, so it should be rather safe to upgrade. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453705#comment-17453705 ] ASF GitHub Bot commented on AIRFLOW-4922: - potiuk commented on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-986311760 Well. @blotouta2. I think this issue discusses 3 or 4 different. Problems. The issue is clsoed so it's likely it's a different issue altogether. If you REALLLY want to get help, just open a new issue and provide all the details you can (and ideally a reproducible case). Or if you do not have reproducible case - provide as much information as you can and open a Github Discussion And BTW. if you are using an older version of Airflow, just upgrade to the newest and check - Airlfow follows SemVer, so it should be rather safe to upgrade. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453287#comment-17453287 ] ASF GitHub Bot commented on AIRFLOW-4922: - blotouta2 commented on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-985965125 I have also got the same exception, was this issue fixed in any version ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452606#comment-17452606 ] ASF GitHub Bot commented on AIRFLOW-6786: - potiuk commented on pull request #12388: URL: https://github.com/apache/airflow/pull/12388#issuecomment-984991868 unbelivable (!) you have not done it yet @serge-salamanka-1pt ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor > > > Key: AIRFLOW-6786 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6786 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, hooks >Affects Versions: 1.10.9 >Reporter: Daniel Ferguson >Assignee: Daniel Ferguson >Priority: Minor > > Add the KafkaProducerHook. > Add the KafkaConsumerHook. > Add the KafkaSensor which listens to messages with a specific topic. > Related Issue: > #1311 (Pre-dates Jira Migration) > Reminder to contributors: > You must add an Apache License header to all new files > Please squash your commits when possible and follow the 7 rules of good Git > commits > I am new to the community, I am not sure the files are at the right place or > missing anything. > The sensor could be used as the first node of a dag where the second node can > be a TriggerDagRunOperator. The messages are polled in a batch and the dag > runs are dynamically generated. > Thanks! > Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], > it is important to mention these integrations are not suitable for > low-latency/high-throughput/streaming. For reference, [#1415 > (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806]. > Co-authored-by: Dan Ferguson > [dferguson...@gmail.com|mailto:dferguson...@gmail.com] > Co-authored-by: YuanfΞi Zhu -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452605#comment-17452605 ] ASF GitHub Bot commented on AIRFLOW-6786: - potiuk commented on pull request #12388: URL: https://github.com/apache/airflow/pull/12388#issuecomment-984991544 @serge-salamanka-1pt - maybe you would like to contribute it ? Airflow is created by >1800 contributors and you can become one and add Kafka support! The OSS world works this way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor > > > Key: AIRFLOW-6786 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6786 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, hooks >Affects Versions: 1.10.9 >Reporter: Daniel Ferguson >Assignee: Daniel Ferguson >Priority: Minor > > Add the KafkaProducerHook. > Add the KafkaConsumerHook. > Add the KafkaSensor which listens to messages with a specific topic. > Related Issue: > #1311 (Pre-dates Jira Migration) > Reminder to contributors: > You must add an Apache License header to all new files > Please squash your commits when possible and follow the 7 rules of good Git > commits > I am new to the community, I am not sure the files are at the right place or > missing anything. > The sensor could be used as the first node of a dag where the second node can > be a TriggerDagRunOperator. The messages are polled in a batch and the dag > runs are dynamically generated. > Thanks! > Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], > it is important to mention these integrations are not suitable for > low-latency/high-throughput/streaming. For reference, [#1415 > (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806]. > Co-authored-by: Dan Ferguson > [dferguson...@gmail.com|mailto:dferguson...@gmail.com] > Co-authored-by: YuanfΞi Zhu -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452280#comment-17452280 ] ASF GitHub Bot commented on AIRFLOW-6786: - serge-salamanka-1pt commented on pull request #12388: URL: https://github.com/apache/airflow/pull/12388#issuecomment-984473014 unbelievable (!) Ariflow does not support Kafka out of the box yet !? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor > > > Key: AIRFLOW-6786 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6786 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, hooks >Affects Versions: 1.10.9 >Reporter: Daniel Ferguson >Assignee: Daniel Ferguson >Priority: Minor > > Add the KafkaProducerHook. > Add the KafkaConsumerHook. > Add the KafkaSensor which listens to messages with a specific topic. > Related Issue: > #1311 (Pre-dates Jira Migration) > Reminder to contributors: > You must add an Apache License header to all new files > Please squash your commits when possible and follow the 7 rules of good Git > commits > I am new to the community, I am not sure the files are at the right place or > missing anything. > The sensor could be used as the first node of a dag where the second node can > be a TriggerDagRunOperator. The messages are polled in a batch and the dag > runs are dynamically generated. > Thanks! > Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], > it is important to mention these integrations are not suitable for > low-latency/high-throughput/streaming. For reference, [#1415 > (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806]. > Co-authored-by: Dan Ferguson > [dferguson...@gmail.com|mailto:dferguson...@gmail.com] > Co-authored-by: YuanfΞi Zhu -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451785#comment-17451785 ] ASF GitHub Bot commented on AIRFLOW-4922: - danilocurvelo commented on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-983607427 Exactly the same error as @vdusek posted in Airflow 2.1.x. Was this fixed in Airflow 2.2.x? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17450537#comment-17450537 ] ASF GitHub Bot commented on AIRFLOW-4922: - taohuzefu edited a comment on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-981768485 > Hi, I am still seeing the issue in 2.1.1 version, my executor is celery @xuemengran Hi, how about now? Did you fix that? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17450536#comment-17450536 ] ASF GitHub Bot commented on AIRFLOW-4922: - taohuzefu commented on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-981768485 > Hi, I am still seeing the issue in 2.1.1 version, my executor is celery Hi, how about now? Did you fix that? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-3702) Reverse Backfilling(from current date to start date)
[ https://issues.apache.org/jira/browse/AIRFLOW-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17441053#comment-17441053 ] ASF GitHub Bot commented on AIRFLOW-3702: - aashayVimeo commented on pull request #4676: URL: https://github.com/apache/airflow/pull/4676#issuecomment-964018648 hi @dima-asana @feng-tao Any updates on the above comment for running backwards dag from dag definition. ``` from airflow import DAG dag = DAG( dag_id = "...", scheduled_interval = "@daily", run_backwards = True, ... ) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Reverse Backfilling(from current date to start date) > > > Key: AIRFLOW-3702 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3702 > Project: Apache Airflow > Issue Type: Improvement > Components: configuration, DAG, models >Affects Versions: 1.10.1 > Environment: MacOS High Sierra > python2.7 > Airflow-1.10.1 >Reporter: Shubham Gupta >Assignee: Dima Kamalov >Priority: Major > Labels: critical, improvement, priority > Fix For: 1.10.3 > > Original Estimate: 336h > Remaining Estimate: 336h > > Hello, > I think there is a need to have reverse backfilling option as well because > recent jobs would take precedence over the historical jobs. We can come up > with some variable in the DAG such as dagrun_order_default = True/False . > This would help in many use cases, in which previous date pipeline does not > depends on current pipeline. > I saw this page which talks about this -> > http://mail-archives.apache.org/mod_mbox/airflow-dev/201804.mbox/%3CCAPUwX3M7_qrn=1bqysmkdv_ifjbta6lbtq7czhhexszmdjk...@mail.gmail.com%3E > Thanks! > Regards, > Shubham -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17438722#comment-17438722 ] ASF GitHub Bot commented on AIRFLOW-4922: - zhengxianh removed a comment on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-957984427 Hi, I'm trying to set up Airflow 2.1.4 with docker on multiple machines, but failed when the webserver is not able to access the task logs. Can anyone help me out please? I've created a post in stackoverflow with details: https://stackoverflow.com/questions/68694805/airflow-webserver-not-able-to-access-remote-worker-logs If you need any other info, please let me know. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17437616#comment-17437616 ] ASF GitHub Bot commented on AIRFLOW-4922: - zhengxianh commented on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-957984427 Hi, I'm trying to set up Airflow 2.1.4 with docker on multiple machines, but failed when the webserver is not able to access the task logs. Can anyone help me out please? I've created a post in stackoverflow with details: https://stackoverflow.com/questions/68694805/airflow-webserver-not-able-to-access-remote-worker-logs If you need any other info, please let me know. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17437528#comment-17437528 ] ASF GitHub Bot commented on AIRFLOW-4922: - zhengxianh commented on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-957984427 Hi, I'm trying to set up Airflow 2.1.4 with docker on multiple machines, but failed when the webserver is not able to access the task logs. Can anyone help me out please? I've created a post in stackoverflow with details: https://stackoverflow.com/questions/68694805/airflow-webserver-not-able-to-access-remote-worker-logs If you need any other info, please let me know. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5911) Simplify lineage support and improve robustness
[ https://issues.apache.org/jira/browse/AIRFLOW-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17433558#comment-17433558 ] ASF GitHub Bot commented on AIRFLOW-5911: - alokjain-01 commented on pull request #6564: URL: https://github.com/apache/airflow/pull/6564#issuecomment-950526422 is this problem of lineage fixed in Airflow 2.0 onward. I am using 2.1.4 and still not getting any lineage in Apache atlas. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Simplify lineage support and improve robustness > --- > > Key: AIRFLOW-5911 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5911 > Project: Apache Airflow > Issue Type: Sub-task > Components: lineage >Affects Versions: 1.10.6 >Reporter: Bolke de Bruin >Priority: Major > Fix For: 2.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432763#comment-17432763 ] ASF GitHub Bot commented on AIRFLOW-5071: - nguyenmphu edited a comment on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-949189802 I found that in the code of `airflow/jobs/scheduler_job.py`: https://github.com/apache/airflow/blob/main/airflow/jobs/scheduler_job.py#L535 ``` python if ti.try_number == buffer_key.try_number and ti.state == State.QUEUED: Stats.incr('scheduler.tasks.killed_externally') msg = ( "Executor reports task instance %s finished (%s) although the " "task says its %s. (Info: %s) Was the task killed externally?" ) self.log.error(msg, ti, state, ti.state, info) ``` The scheduler checks the state of the task instance. When a task instance is rescheduled (e.g: an external sensor), its state transition up_for_reschedule -> scheduled -> queued -> running. If its state is queued and not moved to the running state, the scheduler will raise an error. So I think the code needs to be changed: ``` python if ti.try_number == buffer_key.try_number and ( ti.state == State.QUEUED and not TaskReschedule.find_for_task_instance(ti, session=session) ): Stats.incr('scheduler.tasks.killed_externally') msg = ( "Executor reports task instance %s finished (%s) although the " "task says its %s. (Info: %s) Was the task killed externally?" ) self.log.error(msg, ti, state, ti.state, info) ``` Here is my PR: https://github.com/apache/airflow/pull/19123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432762#comment-17432762 ] ASF GitHub Bot commented on AIRFLOW-5071: - nguyenmphu commented on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-949189802 I found that in the code of `airflow/jobs/scheduler_job.py`: https://github.com/apache/airflow/blob/main/airflow/jobs/scheduler_job.py#L535 ``` python if ti.try_number == buffer_key.try_number and ti.state == State.QUEUED: Stats.incr('scheduler.tasks.killed_externally') msg = ( "Executor reports task instance %s finished (%s) although the " "task says its %s. (Info: %s) Was the task killed externally?" ) self.log.error(msg, ti, state, ti.state, info) ``` The scheduler checks the state of the task instance. When a task instance is rescheduled (e.g: an external sensor), its state transition up_for_reschedule -> scheduled -> queued -> running. If its state is queued and not moved to the running state, the scheduler will raise an error. So I think the code needs to change: ``` python if ti.try_number == buffer_key.try_number and ( ti.state == State.QUEUED and not TaskReschedule.find_for_task_instance(ti, session=session) ): Stats.incr('scheduler.tasks.killed_externally') msg = ( "Executor reports task instance %s finished (%s) although the " "task says its %s. (Info: %s) Was the task killed externally?" ) self.log.error(msg, ti, state, ti.state, info) ``` Here is my PR: https://github.com/apache/airflow/pull/19123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17430385#comment-17430385 ] ASF GitHub Bot commented on AIRFLOW-5071: - jledru-redoute commented on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-946459118 Hello, Still on version 1.10.12 managed by cloud composer but we are intending to move quite quickly to airflow 2. But it seems that this issue is not really resolved on the version 2. We are experiencing this issue not every day, but quite often and always on the same dags. Those dags are dynamically generated by the same python file in airflow based on conf files scanning. It took generally around 12s to parse, so I don't think this is the issue. It looks like this : ``` for country in DAG_PARAMS['countries']: for audience_type in AUDIENCES_TYPE: # get audiences conf file to generate the dags conf_files = glob.glob( f"/home/airflow/gcs/data/CAM/{ country['country_code'] }/COMPOSER_PARAM_SOURCES/{ audience_type['type'] }/*") audiences_list = [] for conf_file in conf_files: string_conf = open(conf_file, 'rb').read().decode("UTF-8") audiences_list.append(json.loads(string_conf)) for letter in ascii_uppercase: dag_aud_list = [ aud for aud in audiences_list if aud["CATEG_CODE"][0] == letter] if dag_aud_list: dag = create_dag(audience_type, country, dag_aud_list) globals()[ f"{ audience_type['type'] }_{ country['country_code'] }_{ letter }_dag"] = dag ``` I understand it is not quite recommanded (however what is preco for this type of DAG) but that's the way it is done. It generates for now around 10 dags with approx 35 init sensor in reschedule mode every 20 minutes. Worker machine is n1-standard-4 set with worker_concurrency at 24. Therefore yesterday on 35 celerys task set to be reschedule, 32 of them were rescheduled on the same worker (there are 3 workers) at quite the same time (I'm not sure how to see of the worker concurrency was respected or not but I doubt it) causing 17 of them to fail with this specific issue ... If I understand, set worker_autoscale to "4,2" (and keeping worker_concurrency to 24) would resolve the issue ? Thanks, -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-3739) Make start_date optional
[ https://issues.apache.org/jira/browse/AIRFLOW-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417296#comment-17417296 ] ASF GitHub Bot commented on AIRFLOW-3739: - BasPH commented on pull request #5423: URL: https://github.com/apache/airflow/pull/5423#issuecomment-922438443 It's been a while but I can create a new PR -- still think it's valuable for DAGs that are triggered only manually. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Make start_date optional > > > Key: AIRFLOW-3739 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3739 > Project: Apache Airflow > Issue Type: Improvement > Components: api >Affects Versions: 1.10.3 >Reporter: Anatoli Babenia >Priority: Major > > I want to define DAG, but not schedule it for running. > ``` > from airflow import DAG > from airflow.operators.bash_operator import BashOperator > dag = DAG('115', schedule_interval="@daily") > seed = BashOperator( > task_id='get_seed', > bash_command='date' > ) > dag >> seed > ``` > This fails with the error below. > ``` > airflow.exceptions.AirflowException: Task is missing the start_date parameter > zsh returned exit code 1 > ``` > It it possible to make `start_date` optional. If not, why? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-3739) Make start_date optional
[ https://issues.apache.org/jira/browse/AIRFLOW-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417295#comment-17417295 ] ASF GitHub Bot commented on AIRFLOW-3739: - potiuk edited a comment on pull request #5423: URL: https://github.com/apache/airflow/pull/5423#issuecomment-922436740 The repository for that PR has been deleted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Make start_date optional > > > Key: AIRFLOW-3739 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3739 > Project: Apache Airflow > Issue Type: Improvement > Components: api >Affects Versions: 1.10.3 >Reporter: Anatoli Babenia >Priority: Major > > I want to define DAG, but not schedule it for running. > ``` > from airflow import DAG > from airflow.operators.bash_operator import BashOperator > dag = DAG('115', schedule_interval="@daily") > seed = BashOperator( > task_id='get_seed', > bash_command='date' > ) > dag >> seed > ``` > This fails with the error below. > ``` > airflow.exceptions.AirflowException: Task is missing the start_date parameter > zsh returned exit code 1 > ``` > It it possible to make `start_date` optional. If not, why? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-3739) Make start_date optional
[ https://issues.apache.org/jira/browse/AIRFLOW-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417294#comment-17417294 ] ASF GitHub Bot commented on AIRFLOW-3739: - potiuk commented on pull request #5423: URL: https://github.com/apache/airflow/pull/5423#issuecomment-922436740 The repository for that project has been deleted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Make start_date optional > > > Key: AIRFLOW-3739 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3739 > Project: Apache Airflow > Issue Type: Improvement > Components: api >Affects Versions: 1.10.3 >Reporter: Anatoli Babenia >Priority: Major > > I want to define DAG, but not schedule it for running. > ``` > from airflow import DAG > from airflow.operators.bash_operator import BashOperator > dag = DAG('115', schedule_interval="@daily") > seed = BashOperator( > task_id='get_seed', > bash_command='date' > ) > dag >> seed > ``` > This fails with the error below. > ``` > airflow.exceptions.AirflowException: Task is missing the start_date parameter > zsh returned exit code 1 > ``` > It it possible to make `start_date` optional. If not, why? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-3739) Make start_date optional
[ https://issues.apache.org/jira/browse/AIRFLOW-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417286#comment-17417286 ] ASF GitHub Bot commented on AIRFLOW-3739: - abitrolly commented on pull request #5423: URL: https://github.com/apache/airflow/pull/5423#issuecomment-922424760 @potiuk why it is closed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Make start_date optional > > > Key: AIRFLOW-3739 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3739 > Project: Apache Airflow > Issue Type: Improvement > Components: api >Affects Versions: 1.10.3 >Reporter: Anatoli Babenia >Priority: Major > > I want to define DAG, but not schedule it for running. > ``` > from airflow import DAG > from airflow.operators.bash_operator import BashOperator > dag = DAG('115', schedule_interval="@daily") > seed = BashOperator( > task_id='get_seed', > bash_command='date' > ) > dag >> seed > ``` > This fails with the error below. > ``` > airflow.exceptions.AirflowException: Task is missing the start_date parameter > zsh returned exit code 1 > ``` > It it possible to make `start_date` optional. If not, why? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-3739) Make start_date optional
[ https://issues.apache.org/jira/browse/AIRFLOW-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417229#comment-17417229 ] ASF GitHub Bot commented on AIRFLOW-3739: - potiuk closed pull request #5423: URL: https://github.com/apache/airflow/pull/5423 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Make start_date optional > > > Key: AIRFLOW-3739 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3739 > Project: Apache Airflow > Issue Type: Improvement > Components: api >Affects Versions: 1.10.3 >Reporter: Anatoli Babenia >Priority: Major > > I want to define DAG, but not schedule it for running. > ``` > from airflow import DAG > from airflow.operators.bash_operator import BashOperator > dag = DAG('115', schedule_interval="@daily") > seed = BashOperator( > task_id='get_seed', > bash_command='date' > ) > dag >> seed > ``` > This fails with the error below. > ``` > airflow.exceptions.AirflowException: Task is missing the start_date parameter > zsh returned exit code 1 > ``` > It it possible to make `start_date` optional. If not, why? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-6544) Log_id is not missing when writing logs to elastic search
[ https://issues.apache.org/jira/browse/AIRFLOW-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417230#comment-17417230 ] ASF GitHub Bot commented on AIRFLOW-6544: - potiuk closed pull request #7141: URL: https://github.com/apache/airflow/pull/7141 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Log_id is not missing when writing logs to elastic search > - > > Key: AIRFLOW-6544 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6544 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.7 >Reporter: Larry Zhu >Assignee: Larry Zhu >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Original Estimate: 1h > Remaining Estimate: 1h > > The “end of log” marker does not include the aforementioned log_id. The issue > is then airflow-web does not know when to stop tailing the logs. > Also it would be better to include an elasticsearch configuration for index > name so that the search is more efficient in big clusters with a lot of > indices -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4374) Inherit from Enum
[ https://issues.apache.org/jira/browse/AIRFLOW-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417228#comment-17417228 ] ASF GitHub Bot commented on AIRFLOW-4374: - potiuk closed pull request #5302: URL: https://github.com/apache/airflow/pull/5302 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Inherit from Enum > - > > Key: AIRFLOW-4374 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4374 > Project: Apache Airflow > Issue Type: Sub-task > Components: core >Reporter: Bas Harenslak >Assignee: Bas Harenslak >Priority: Minor > > Python 3.4 introduced the Enum type, which can reduce the complexity of a few > enum-style classes. E.g. the TriggerRule could become: > {code:java} > from enum import Enum > class TriggerRule(Enum): > ALL_SUCCESS = "all_success" > ALL_FAILED = "all_failed" > ALL_DONE = "all_done" > ONE_SUCCESS = "one_success" > ONE_FAILED = "one_failed" > NONE_FAILED = "none_failed" > NONE_SKIPPED = "none_skipped" > DUMMY = "dummy" > @classmethod > def all_triggers(cls): > return [tr.name for tr in TriggerRule] > {code} > A quick scan showed me enum-like class are TriggerRule, WeightRule, -State-, > but there might be more, so search first. > Also, note this is optional and not required for running in Python 3. > > Edit: not touching the State in this one, since state contains additional > attributes (e.g. state_color) and in Python Enum all attributes are part of > the Enum class, which we don't want in this case. Will have to rethink this > one. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-6334) Change CommandType in executors to be a dictionary
[ https://issues.apache.org/jira/browse/AIRFLOW-6334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417227#comment-17417227 ] ASF GitHub Bot commented on AIRFLOW-6334: - potiuk closed pull request #7085: URL: https://github.com/apache/airflow/pull/7085 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Change CommandType in executors to be a dictionary > -- > > Key: AIRFLOW-6334 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6334 > Project: Apache Airflow > Issue Type: Improvement > Components: executors >Affects Versions: 2.0.0 >Reporter: Tomasz Urbaszek >Priority: Major > > Most of the executors run tasks by running 'airflow tasks run ...`. This is > achieved by passing ['airflow', 'tasks', 'run', ...] object to > subprocess.check_call. I would love to abandon this limiting type and instead > use a dictionary. > Then such dictionary could be passed to new method that will run the task > without using subprocess. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4541) Replace mkdirs usage with pathlib
[ https://issues.apache.org/jira/browse/AIRFLOW-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417226#comment-17417226 ] ASF GitHub Bot commented on AIRFLOW-4541: - potiuk closed pull request #5304: URL: https://github.com/apache/airflow/pull/5304 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Replace mkdirs usage with pathlib > - > > Key: AIRFLOW-4541 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4541 > Project: Apache Airflow > Issue Type: Sub-task > Components: core >Reporter: Jarek Potiuk >Assignee: Bas Harenslak >Priority: Major > > _makedirs is used in 'airlfow.utils.file.mkdirs' - it could be replaced with > pathlib now with python3.5+_ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-139) Executing VACUUM with PostgresOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408173#comment-17408173 ] ASF GitHub Bot commented on AIRFLOW-139: HansBambel commented on pull request #1821: URL: https://github.com/apache/airflow/pull/1821#issuecomment-910291067 If anybody (like me) still faced this issue: you need to add `autocommit=True` to the `PostgresOperator`: ``` vacuum = PostgresOperator( task_id="vacuum-database", postgres_conn_id=db_conn_id, sql="VACUUM FULL;", autocommit=True, ) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Executing VACUUM with PostgresOperator > -- > > Key: AIRFLOW-139 > URL: https://issues.apache.org/jira/browse/AIRFLOW-139 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 1.7.0 >Reporter: Rafael >Priority: Major > Fix For: 1.8.1 > > > Dear Airflow Maintainers, > h1. Environment > * Airflow version: *v1.7.0* > * Airflow components: *PostgresOperator* > * Python Version: *Python 3.5.1* > * Operating System: *15.4.0 Darwin* > h1. Description of Issue > I am trying to execute a `VACUUM` command as part of DAG with the > `PostgresOperator`, which fails with the following error: > {quote} > [2016-05-14 16:14:01,849] {__init__.py:36} INFO - Using executor > SequentialExecutor > Traceback (most recent call last): > File "/usr/local/bin/airflow", line 15, in > args.func(args) > File > "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/airflow/bin/cli.py", > line 203, in run > pool=args.pool, > File > "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/airflow/models.py", > line 1067, in run > result = task_copy.execute(context=context) > File > "/usr/local/lib/python3.5/site-packages/airflow/operators/postgres_operator.py", > line 39, in execute > self.hook.run(self.sql, self.autocommit, parameters=self.parameters) > File > "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/airflow/hooks/dbapi_hook.py", > line 109, in run > cur.execute(s) > psycopg2.InternalError: VACUUM cannot run inside a transaction block > {quote} > I could create a small python script that performs the operation, as > explained in [this stackoverflow > entry](http://stackoverflow.com/questions/1017463/postgresql-how-to-run-vacuum-from-code-outside-transaction-block). > However, I would like to know first if the `VACUUM` command should be > supported by the `PostgresOperator`. > h1. Reproducing the Issue > The operator can be declared as follows: > {quote} > conn = ('postgres_default') > t4 = PostgresOperator( > task_id='vacuum', > postgres_conn_id=conn, > sql=("VACUUM public.table"), > dag=dag > ) > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408056#comment-17408056 ] ASF GitHub Bot commented on AIRFLOW-6786: - github-actions[bot] closed pull request #12388: URL: https://github.com/apache/airflow/pull/12388 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor > > > Key: AIRFLOW-6786 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6786 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, hooks >Affects Versions: 1.10.9 >Reporter: Daniel Ferguson >Assignee: Daniel Ferguson >Priority: Minor > > Add the KafkaProducerHook. > Add the KafkaConsumerHook. > Add the KafkaSensor which listens to messages with a specific topic. > Related Issue: > #1311 (Pre-dates Jira Migration) > Reminder to contributors: > You must add an Apache License header to all new files > Please squash your commits when possible and follow the 7 rules of good Git > commits > I am new to the community, I am not sure the files are at the right place or > missing anything. > The sensor could be used as the first node of a dag where the second node can > be a TriggerDagRunOperator. The messages are polled in a batch and the dag > runs are dynamically generated. > Thanks! > Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], > it is important to mention these integrations are not suitable for > low-latency/high-throughput/streaming. For reference, [#1415 > (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806]. > Co-authored-by: Dan Ferguson > [dferguson...@gmail.com|mailto:dferguson...@gmail.com] > Co-authored-by: YuanfΞi Zhu -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-1424) Make the next execution date of DAGs visible
[ https://issues.apache.org/jira/browse/AIRFLOW-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407991#comment-17407991 ] ASF GitHub Bot commented on AIRFLOW-1424: - bbovenzi closed pull request #2460: URL: https://github.com/apache/airflow/pull/2460 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Make the next execution date of DAGs visible > > > Key: AIRFLOW-1424 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1424 > Project: Apache Airflow > Issue Type: New Feature > Components: DAG >Reporter: Ultrabug >Priority: Major > > The scheduler's DAG run creation logic can be tricky and one is > easily confused with the start_date + interval > and period end scheduling way of thinking. > It would ease airflow's usage to add a next execution field to DAGs > so that we can very easily see the (un)famous period end after which > the scheduler will create a new DAG run for our workflows. > These patches are a simple way to implement this on the DAG model > and make use of this in the interface. > Screenshots are on the github PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407730#comment-17407730 ] ASF GitHub Bot commented on AIRFLOW-6786: - github-actions[bot] closed pull request #12388: URL: https://github.com/apache/airflow/pull/12388 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor > > > Key: AIRFLOW-6786 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6786 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, hooks >Affects Versions: 1.10.9 >Reporter: Daniel Ferguson >Assignee: Daniel Ferguson >Priority: Minor > > Add the KafkaProducerHook. > Add the KafkaConsumerHook. > Add the KafkaSensor which listens to messages with a specific topic. > Related Issue: > #1311 (Pre-dates Jira Migration) > Reminder to contributors: > You must add an Apache License header to all new files > Please squash your commits when possible and follow the 7 rules of good Git > commits > I am new to the community, I am not sure the files are at the right place or > missing anything. > The sensor could be used as the first node of a dag where the second node can > be a TriggerDagRunOperator. The messages are polled in a batch and the dag > runs are dynamically generated. > Thanks! > Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], > it is important to mention these integrations are not suitable for > low-latency/high-throughput/streaming. For reference, [#1415 > (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806]. > Co-authored-by: Dan Ferguson > [dferguson...@gmail.com|mailto:dferguson...@gmail.com] > Co-authored-by: YuanfΞi Zhu -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-1424) Make the next execution date of DAGs visible
[ https://issues.apache.org/jira/browse/AIRFLOW-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407723#comment-17407723 ] ASF GitHub Bot commented on AIRFLOW-1424: - bbovenzi closed pull request #2460: URL: https://github.com/apache/airflow/pull/2460 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Make the next execution date of DAGs visible > > > Key: AIRFLOW-1424 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1424 > Project: Apache Airflow > Issue Type: New Feature > Components: DAG >Reporter: Ultrabug >Priority: Major > > The scheduler's DAG run creation logic can be tricky and one is > easily confused with the start_date + interval > and period end scheduling way of thinking. > It would ease airflow's usage to add a next execution field to DAGs > so that we can very easily see the (un)famous period end after which > the scheduler will create a new DAG run for our workflows. > These patches are a simple way to implement this on the DAG model > and make use of this in the interface. > Screenshots are on the github PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files
[ https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407441#comment-17407441 ] ASF GitHub Bot commented on AIRFLOW-5703: - AngryHelper commented on a change in pull request #16792: URL: https://github.com/apache/airflow/pull/16792#discussion_r698559013 ## File path: airflow/providers/sftp/operators/sftp.py ## @@ -88,6 +124,9 @@ def __init__( remote_host=None, local_filepath=None, remote_filepath=None, +local_folder=None, +remote_folder=None, +regexp_mask=None, Review comment: for this case will print AirflowException with examples https://github.com/apache/airflow/pull/16792/files#diff-911f39c07d851bc2f0a2958674ed940ddc68aea6ecc7d71c29567816c182eaf6R216 I checked the example of dag again, I am comfortable working with one operetor, even if he has combined arguments. Your idea i understand too, may be we will wait @potiuk and hear his opinion ## File path: airflow/providers/sftp/operators/sftp.py ## @@ -134,29 +176,81 @@ def execute(self, context: Any) -> str: with self.ssh_hook.get_conn() as ssh_client: sftp_client = ssh_client.open_sftp() -if self.operation.lower() == SFTPOperation.GET: -local_folder = os.path.dirname(self.local_filepath) -if self.create_intermediate_dirs: -Path(local_folder).mkdir(parents=True, exist_ok=True) -file_msg = f"from {self.remote_filepath} to {self.local_filepath}" -self.log.info("Starting to transfer %s", file_msg) -sftp_client.get(self.remote_filepath, self.local_filepath) -else: -remote_folder = os.path.dirname(self.remote_filepath) -if self.create_intermediate_dirs: -_make_intermediate_dirs( -sftp_client=sftp_client, -remote_directory=remote_folder, +if self.local_filepath and self.remote_filepath: +if isinstance(self.local_filepath, list) and isinstance(self.remote_filepath, str): +for file_path in self.local_filepath: +local_folder = os.path.dirname(file_path) +local_file = os.path.basename(file_path) +file_msg = file_path +self._transfer(sftp_client, local_folder, local_file, self.remote_filepath) +elif isinstance(self.remote_filepath, list) and isinstance(self.local_filepath, str): +for file_path in self.remote_filepath: +remote_folder = os.path.dirname(file_path) +remote_file = os.path.basename(file_path) +file_msg = file_path +self._transfer(sftp_client, self.local_filepath, remote_file, remote_folder) +elif isinstance(self.remote_filepath, str) and isinstance(self.local_filepath, str): +local_folder = os.path.dirname(self.local_filepath) +file_msg = self.local_filepath +self._transfer( +sftp_client, +local_folder, +self.local_filepath, +self.remote_filepath, +only_file=True, ) -file_msg = f"from {self.local_filepath} to {self.remote_filepath}" -self.log.info("Starting to transfer file %s", file_msg) -sftp_client.put(self.local_filepath, self.remote_filepath, confirm=self.confirm) +elif self.local_folder and self.remote_folder: +if self.operation.lower() == SFTPOperation.PUT: +files_list = self._search_files(os.listdir(self.local_folder)) +for file in files_list: +local_file = os.path.basename(file) +file_msg = file +self._transfer(sftp_client, self.local_folder, local_file, self.remote_folder) +elif self.operation.lower() == SFTPOperation.GET: +files_list = self._search_files(sftp_client.listdir(self.remote_folder)) +for file in files_list: +remote_file = ntpath.basename(file) Review comment: yep, sorry, fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:
[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files
[ https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407434#comment-17407434 ] ASF GitHub Bot commented on AIRFLOW-5703: - AngryHelper commented on pull request #16792: URL: https://github.com/apache/airflow/pull/16792#issuecomment-908647425 sorry for flood, I want create another PR with true commits -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow - SFTP Operator for multiple files > -- > > Key: AIRFLOW-5703 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5703 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Affects Versions: 1.10.5 >Reporter: Mattia >Assignee: Andrey Kucherov >Priority: Major > > *AS* User > *I WANT TO* download / upload multiple files from sftp server > *SO THAT* i need the possibility to add a list of file instead of single one -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files
[ https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407425#comment-17407425 ] ASF GitHub Bot commented on AIRFLOW-5703: - potiuk commented on pull request #16792: URL: https://github.com/apache/airflow/pull/16792#issuecomment-908509451 Can you please rebase to latest `main` @AngryHelper ? This will fix the failing image build, -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow - SFTP Operator for multiple files > -- > > Key: AIRFLOW-5703 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5703 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Affects Versions: 1.10.5 >Reporter: Mattia >Assignee: Andrey Kucherov >Priority: Major > > *AS* User > *I WANT TO* download / upload multiple files from sftp server > *SO THAT* i need the possibility to add a list of file instead of single one -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files
[ https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407408#comment-17407408 ] ASF GitHub Bot commented on AIRFLOW-5703: - AngryHelper closed pull request #16792: URL: https://github.com/apache/airflow/pull/16792 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow - SFTP Operator for multiple files > -- > > Key: AIRFLOW-5703 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5703 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Affects Versions: 1.10.5 >Reporter: Mattia >Assignee: Andrey Kucherov >Priority: Major > > *AS* User > *I WANT TO* download / upload multiple files from sftp server > *SO THAT* i need the possibility to add a list of file instead of single one -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files
[ https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406919#comment-17406919 ] ASF GitHub Bot commented on AIRFLOW-5703: - AngryHelper commented on pull request #16792: URL: https://github.com/apache/airflow/pull/16792#issuecomment-908647425 sorry for flood, I want create another PR with true commits -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow - SFTP Operator for multiple files > -- > > Key: AIRFLOW-5703 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5703 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Affects Versions: 1.10.5 >Reporter: Mattia >Assignee: Andrey Kucherov >Priority: Major > > *AS* User > *I WANT TO* download / upload multiple files from sftp server > *SO THAT* i need the possibility to add a list of file instead of single one -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files
[ https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406918#comment-17406918 ] ASF GitHub Bot commented on AIRFLOW-5703: - AngryHelper closed pull request #16792: URL: https://github.com/apache/airflow/pull/16792 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow - SFTP Operator for multiple files > -- > > Key: AIRFLOW-5703 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5703 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Affects Versions: 1.10.5 >Reporter: Mattia >Assignee: Andrey Kucherov >Priority: Major > > *AS* User > *I WANT TO* download / upload multiple files from sftp server > *SO THAT* i need the possibility to add a list of file instead of single one -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files
[ https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406833#comment-17406833 ] ASF GitHub Bot commented on AIRFLOW-5703: - potiuk commented on pull request #16792: URL: https://github.com/apache/airflow/pull/16792#issuecomment-908509451 Can you please rebase to latest `main` @AngryHelper ? This will fix the failing image build, -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow - SFTP Operator for multiple files > -- > > Key: AIRFLOW-5703 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5703 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Affects Versions: 1.10.5 >Reporter: Mattia >Assignee: Andrey Kucherov >Priority: Major > > *AS* User > *I WANT TO* download / upload multiple files from sftp server > *SO THAT* i need the possibility to add a list of file instead of single one -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files
[ https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406786#comment-17406786 ] ASF GitHub Bot commented on AIRFLOW-5703: - AngryHelper commented on a change in pull request #16792: URL: https://github.com/apache/airflow/pull/16792#discussion_r698591513 ## File path: airflow/providers/sftp/operators/sftp.py ## @@ -134,29 +176,81 @@ def execute(self, context: Any) -> str: with self.ssh_hook.get_conn() as ssh_client: sftp_client = ssh_client.open_sftp() -if self.operation.lower() == SFTPOperation.GET: -local_folder = os.path.dirname(self.local_filepath) -if self.create_intermediate_dirs: -Path(local_folder).mkdir(parents=True, exist_ok=True) -file_msg = f"from {self.remote_filepath} to {self.local_filepath}" -self.log.info("Starting to transfer %s", file_msg) -sftp_client.get(self.remote_filepath, self.local_filepath) -else: -remote_folder = os.path.dirname(self.remote_filepath) -if self.create_intermediate_dirs: -_make_intermediate_dirs( -sftp_client=sftp_client, -remote_directory=remote_folder, +if self.local_filepath and self.remote_filepath: +if isinstance(self.local_filepath, list) and isinstance(self.remote_filepath, str): +for file_path in self.local_filepath: +local_folder = os.path.dirname(file_path) +local_file = os.path.basename(file_path) +file_msg = file_path +self._transfer(sftp_client, local_folder, local_file, self.remote_filepath) +elif isinstance(self.remote_filepath, list) and isinstance(self.local_filepath, str): +for file_path in self.remote_filepath: +remote_folder = os.path.dirname(file_path) +remote_file = os.path.basename(file_path) +file_msg = file_path +self._transfer(sftp_client, self.local_filepath, remote_file, remote_folder) +elif isinstance(self.remote_filepath, str) and isinstance(self.local_filepath, str): +local_folder = os.path.dirname(self.local_filepath) +file_msg = self.local_filepath +self._transfer( +sftp_client, +local_folder, +self.local_filepath, +self.remote_filepath, +only_file=True, ) -file_msg = f"from {self.local_filepath} to {self.remote_filepath}" -self.log.info("Starting to transfer file %s", file_msg) -sftp_client.put(self.local_filepath, self.remote_filepath, confirm=self.confirm) +elif self.local_folder and self.remote_folder: +if self.operation.lower() == SFTPOperation.PUT: +files_list = self._search_files(os.listdir(self.local_folder)) +for file in files_list: +local_file = os.path.basename(file) +file_msg = file +self._transfer(sftp_client, self.local_folder, local_file, self.remote_folder) +elif self.operation.lower() == SFTPOperation.GET: +files_list = self._search_files(sftp_client.listdir(self.remote_folder)) +for file in files_list: +remote_file = ntpath.basename(file) Review comment: yep, sorry, fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow - SFTP Operator for multiple files > -- > > Key: AIRFLOW-5703 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5703 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Affects Versions: 1.10.5 >Reporter: Mattia >Assignee: Andrey Kucherov >Priority: Major > > *AS* User > *I WANT TO* download / upload multiple files from sftp server > *SO THAT* i need the possibility to add a list of file instead of
[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files
[ https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406766#comment-17406766 ] ASF GitHub Bot commented on AIRFLOW-5703: - AngryHelper commented on a change in pull request #16792: URL: https://github.com/apache/airflow/pull/16792#discussion_r698559013 ## File path: airflow/providers/sftp/operators/sftp.py ## @@ -88,6 +124,9 @@ def __init__( remote_host=None, local_filepath=None, remote_filepath=None, +local_folder=None, +remote_folder=None, +regexp_mask=None, Review comment: for this case will print AirflowException with examples https://github.com/apache/airflow/pull/16792/files#diff-911f39c07d851bc2f0a2958674ed940ddc68aea6ecc7d71c29567816c182eaf6R216 I checked the example of dag again, I am comfortable working with one operetor, even if he has combined arguments. Your idea i understand too, may be we will wait @potiuk and hear his opinion -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow - SFTP Operator for multiple files > -- > > Key: AIRFLOW-5703 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5703 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Affects Versions: 1.10.5 >Reporter: Mattia >Assignee: Andrey Kucherov >Priority: Major > > *AS* User > *I WANT TO* download / upload multiple files from sftp server > *SO THAT* i need the possibility to add a list of file instead of single one -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files
[ https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406629#comment-17406629 ] ASF GitHub Bot commented on AIRFLOW-5703: - uranusjr commented on a change in pull request #16792: URL: https://github.com/apache/airflow/pull/16792#discussion_r698291886 ## File path: airflow/providers/sftp/operators/sftp.py ## @@ -134,29 +176,81 @@ def execute(self, context: Any) -> str: with self.ssh_hook.get_conn() as ssh_client: sftp_client = ssh_client.open_sftp() -if self.operation.lower() == SFTPOperation.GET: -local_folder = os.path.dirname(self.local_filepath) -if self.create_intermediate_dirs: -Path(local_folder).mkdir(parents=True, exist_ok=True) -file_msg = f"from {self.remote_filepath} to {self.local_filepath}" -self.log.info("Starting to transfer %s", file_msg) -sftp_client.get(self.remote_filepath, self.local_filepath) -else: -remote_folder = os.path.dirname(self.remote_filepath) -if self.create_intermediate_dirs: -_make_intermediate_dirs( -sftp_client=sftp_client, -remote_directory=remote_folder, +if self.local_filepath and self.remote_filepath: +if isinstance(self.local_filepath, list) and isinstance(self.remote_filepath, str): +for file_path in self.local_filepath: +local_folder = os.path.dirname(file_path) +local_file = os.path.basename(file_path) +file_msg = file_path +self._transfer(sftp_client, local_folder, local_file, self.remote_filepath) +elif isinstance(self.remote_filepath, list) and isinstance(self.local_filepath, str): +for file_path in self.remote_filepath: +remote_folder = os.path.dirname(file_path) +remote_file = os.path.basename(file_path) +file_msg = file_path +self._transfer(sftp_client, self.local_filepath, remote_file, remote_folder) +elif isinstance(self.remote_filepath, str) and isinstance(self.local_filepath, str): +local_folder = os.path.dirname(self.local_filepath) +file_msg = self.local_filepath +self._transfer( +sftp_client, +local_folder, +self.local_filepath, +self.remote_filepath, +only_file=True, ) -file_msg = f"from {self.local_filepath} to {self.remote_filepath}" -self.log.info("Starting to transfer file %s", file_msg) -sftp_client.put(self.local_filepath, self.remote_filepath, confirm=self.confirm) +elif self.local_folder and self.remote_folder: +if self.operation.lower() == SFTPOperation.PUT: +files_list = self._search_files(os.listdir(self.local_folder)) +for file in files_list: +local_file = os.path.basename(file) +file_msg = file +self._transfer(sftp_client, self.local_folder, local_file, self.remote_folder) +elif self.operation.lower() == SFTPOperation.GET: +files_list = self._search_files(sftp_client.listdir(self.remote_folder)) +for file in files_list: +remote_file = ntpath.basename(file) Review comment: I think this should be `os.path.basename` like in the `PUT` block. ## File path: airflow/providers/sftp/operators/sftp.py ## @@ -88,6 +124,9 @@ def __init__( remote_host=None, local_filepath=None, remote_filepath=None, +local_folder=None, +remote_folder=None, +regexp_mask=None, Review comment: I am really not fond of this argument list since it’s too easy to pass in nonsensical combinations e.g. ```python SFTPOperator( local_filepath="/my/file", local_folder="/my/dir", ) ``` But I understand this is not easy to change the arguments due to compatibility considerations. Perhapes what we should do is to introduce a separate operator (say `SFTPBatchOperator`) to handle the new feature, instead of bolting it onto the existing one. -- This is an automated message from the Apache Git Service. To
[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files
[ https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406458#comment-17406458 ] ASF GitHub Bot commented on AIRFLOW-5703: - AngryHelper edited a comment on pull request #16792: URL: https://github.com/apache/airflow/pull/16792#issuecomment-907842770 > Need to fix linter issues. See guide: https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#contribution-workflow @uranusjr @potiuk sorry for your waiting, I was solving the issue with my cloud provider. I pushed fix about linter issues, you may see and pls run workflow, as I understand, I don`t have access for this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow - SFTP Operator for multiple files > -- > > Key: AIRFLOW-5703 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5703 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Affects Versions: 1.10.5 >Reporter: Mattia >Assignee: Andrey Kucherov >Priority: Major > > *AS* User > *I WANT TO* download / upload multiple files from sftp server > *SO THAT* i need the possibility to add a list of file instead of single one -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files
[ https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406457#comment-17406457 ] ASF GitHub Bot commented on AIRFLOW-5703: - AngryHelper commented on pull request #16792: URL: https://github.com/apache/airflow/pull/16792#issuecomment-907842770 > Need to fix linter issues. See guide: https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#contribution-workflow sorry for your waiting, I was solving the issue with my cloud provider. I pushed fix about linter issues, you may see and pls run workflow, as I understand, I don`t have access for this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow - SFTP Operator for multiple files > -- > > Key: AIRFLOW-5703 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5703 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Affects Versions: 1.10.5 >Reporter: Mattia >Assignee: Andrey Kucherov >Priority: Major > > *AS* User > *I WANT TO* download / upload multiple files from sftp server > *SO THAT* i need the possibility to add a list of file instead of single one -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405518#comment-17405518 ] ASF GitHub Bot commented on AIRFLOW-6786: - github-actions[bot] commented on pull request #12388: URL: https://github.com/apache/airflow/pull/12388#issuecomment-906824992 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor > > > Key: AIRFLOW-6786 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6786 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, hooks >Affects Versions: 1.10.9 >Reporter: Daniel Ferguson >Assignee: Daniel Ferguson >Priority: Minor > > Add the KafkaProducerHook. > Add the KafkaConsumerHook. > Add the KafkaSensor which listens to messages with a specific topic. > Related Issue: > #1311 (Pre-dates Jira Migration) > Reminder to contributors: > You must add an Apache License header to all new files > Please squash your commits when possible and follow the 7 rules of good Git > commits > I am new to the community, I am not sure the files are at the right place or > missing anything. > The sensor could be used as the first node of a dag where the second node can > be a TriggerDagRunOperator. The messages are polled in a batch and the dag > runs are dynamically generated. > Thanks! > Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], > it is important to mention these integrations are not suitable for > low-latency/high-throughput/streaming. For reference, [#1415 > (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806]. > Co-authored-by: Dan Ferguson > [dferguson...@gmail.com|mailto:dferguson...@gmail.com] > Co-authored-by: YuanfΞi Zhu -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402418#comment-17402418 ] ASF GitHub Bot commented on AIRFLOW-5071: - raviporan commented on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-902954329 Hey Guys, Currently we are on the Airflow 1.14 version; We were getting a similar issue with our tasks going under up_for_retry state for hours. I went thru this thread & comments|inputs from various users on tweaking the poke_interval values; Our original poke_interval was set to 60 and changing the value to ~93 seconds resolved the issue with tasks getting into **_up_for_retry_** state ; This worked like a charm, but wanted to get more details on the race condition that scheduler is getting into when the poke_interval values are <= 60. Appreciate your help. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401244#comment-17401244 ] ASF GitHub Bot commented on AIRFLOW-4922: - knutole edited a comment on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-901041426 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401238#comment-17401238 ] ASF GitHub Bot commented on AIRFLOW-4922: - potiuk edited a comment on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-901137418 It's because you likely have custom logging configuration which does not derive from Airlfow's one as explained in https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#schedule-after-task-execution Make sure you start with: ``` LOGGING_CONFIG = deepcopy(DEFAULT_LOGGING_CONFIG) ``` and modify that config if you want to add custom configuration. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401229#comment-17401229 ] ASF GitHub Bot commented on AIRFLOW-4922: - knutole commented on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-901041426 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401215#comment-17401215 ] ASF GitHub Bot commented on AIRFLOW-4922: - potiuk commented on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-901137418 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401117#comment-17401117 ] ASF GitHub Bot commented on AIRFLOW-4922: - knutole commented on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-901180260 Thanks @potiuk, that is exactly what happened. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401076#comment-17401076 ] ASF GitHub Bot commented on AIRFLOW-4922: - potiuk commented on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-901138236 You likely copied "verbatim" old configuration from Airlfow and modified it, so the new configuration from DEFAULT_LOGGING_CONFIG is not used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401075#comment-17401075 ] ASF GitHub Bot commented on AIRFLOW-4922: - potiuk edited a comment on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-901137418 It's because you likely have custom logging configuration which does not derive from Airlfow's one as explained in https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#schedule-after-task-execution Make sure you start with: ``` LOGGING_CONFIG = deepcopy(DEFAULT_LOGGING_CONFIG) ``` and modify that config if you want to add custom configuration. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401074#comment-17401074 ] ASF GitHub Bot commented on AIRFLOW-4922: - potiuk commented on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-901137418 It's because you likely have custom logging configuration which does not derive from Airlfow's one as explained in https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#schedule-after-task-execution Make sure you start with: ``` LOGGING_CONFIG = deepcopy(DEFAULT_LOGGING_CONFIG) ``` and only modify the config if you want to add custom configuration. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400985#comment-17400985 ] ASF GitHub Bot commented on AIRFLOW-4922: - knutole edited a comment on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-901041426 We are also getting `No SecretsMasker found!` on Airflow 2.1.2... Could this be due to breaking changes in the configuration file? We have tried setting `hide_sensitive_var_conn_fields = False` to no avail. ```bash [2021-08-18 11:05:53,690] {celery_executor.py:120} ERROR - Failed to execute task No SecretsMasker found!. Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/airflow/executors/celery_executor.py", line 117, in _execute_in_fork args.func(args) File "/usr/local/lib/python3.6/dist-packages/airflow/cli/cli_parser.py", line 48, in command return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/airflow/utils/cli.py", line 91, in wrapper return f(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/airflow/cli/commands/task_command.py", line 212, in task_run settings.configure_orm(disable_connection_pool=True) File "/usr/local/lib/python3.6/dist-packages/airflow/settings.py", line 224, in configure_orm mask_secret(engine.url.password) File "/usr/local/lib/python3.6/dist-packages/airflow/utils/log/secrets_masker.py", line 91, in mask_secret _secrets_masker().add_mask(secret, name) File "/usr/local/lib/python3.6/dist-packages/airflow/utils/log/secrets_masker.py", line 105, in _secrets_masker raise RuntimeError("No SecretsMasker found!") RuntimeError: No SecretsMasker found! [2021-08-18 11:05:53,710: ERROR/ForkPoolWorker-3] Task airflow.executors.celery_executor.execute_command[f6a9b0cd-bb0c-414a-a51c-80579f2d2f1e] raised unexpected: AirflowException('Celery command failed on host: 64c3bc97f173',) Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/celery/app/trace.py", line 412, in trace_task R = retval = fun(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/celery/app/trace.py", line 704, in __protected_call__ return self.run(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/airflow/executors/celery_executor.py", line 88, in execute_command _execute_in_fork(command_to_exec) File "/usr/local/lib/python3.6/dist-packages/airflow/executors/celery_executor.py", line 99, in _execute_in_fork raise AirflowException('Celery command failed on host: ' + get_hostname()) airflow.exceptions.AirflowException: Celery command failed on host: 64c3bc97f173 ``` The erroring line is 105 in `_secrets_masker.py`: ```python @cache def _secrets_masker() -> "SecretsMasker": for flt in logging.getLogger('airflow.task').filters: if isinstance(flt, SecretsMasker): return flt raise RuntimeError("No SecretsMasker found!") ``` This is our logging setup in our DAG: ```python log = logging.getLogger() log.setLevel(logging.DEBUG) stream_handler = logging.StreamHandler() log.addHandler(stream_handler) ``` How to fix this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400984#comment-17400984 ] ASF GitHub Bot commented on AIRFLOW-4922: - knutole edited a comment on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-901041426 We are also getting `No SecretsMasker found!` on Airflow 2.1.2... Could this be due to breaking changes in the configuration file? We have tried setting `hide_sensitive_var_conn_fields = False` to no avail. ```bash [2021-08-18 11:05:53,690] {celery_executor.py:120} ERROR - Failed to execute task No SecretsMasker found!. Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/airflow/executors/celery_executor.py", line 117, in _execute_in_fork args.func(args) File "/usr/local/lib/python3.6/dist-packages/airflow/cli/cli_parser.py", line 48, in command return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/airflow/utils/cli.py", line 91, in wrapper return f(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/airflow/cli/commands/task_command.py", line 212, in task_run settings.configure_orm(disable_connection_pool=True) File "/usr/local/lib/python3.6/dist-packages/airflow/settings.py", line 224, in configure_orm mask_secret(engine.url.password) File "/usr/local/lib/python3.6/dist-packages/airflow/utils/log/secrets_masker.py", line 91, in mask_secret _secrets_masker().add_mask(secret, name) File "/usr/local/lib/python3.6/dist-packages/airflow/utils/log/secrets_masker.py", line 105, in _secrets_masker raise RuntimeError("No SecretsMasker found!") RuntimeError: No SecretsMasker found! [2021-08-18 11:05:53,710: ERROR/ForkPoolWorker-3] Task airflow.executors.celery_executor.execute_command[f6a9b0cd-bb0c-414a-a51c-80579f2d2f1e] raised unexpected: AirflowException('Celery command failed on host: 64c3bc97f173',) Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/celery/app/trace.py", line 412, in trace_task R = retval = fun(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/celery/app/trace.py", line 704, in __protected_call__ return self.run(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/airflow/executors/celery_executor.py", line 88, in execute_command _execute_in_fork(command_to_exec) File "/usr/local/lib/python3.6/dist-packages/airflow/executors/celery_executor.py", line 99, in _execute_in_fork raise AirflowException('Celery command failed on host: ' + get_hostname()) airflow.exceptions.AirflowException: Celery command failed on host: 64c3bc97f173 ``` The erroring line is 105 in `_secrets_masker.py`: ```python @cache def _secrets_masker() -> "SecretsMasker": for flt in logging.getLogger('airflow.task').filters: if isinstance(flt, SecretsMasker): return flt raise RuntimeError("No SecretsMasker found!") ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400982#comment-17400982 ] ASF GitHub Bot commented on AIRFLOW-4922: - knutole commented on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-901041426 We are also getting `No SecretsMasker found!` on Airflow 2.1.2... Could this be due to breaking changes in the configuration file? We have tried setting `hide_sensitive_var_conn_fields = False` to no avail. ``` [2021-08-18 11:05:53,690] {celery_executor.py:120} ERROR - Failed to execute task No SecretsMasker found!. Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/airflow/executors/celery_executor.py", line 117, in _execute_in_fork args.func(args) File "/usr/local/lib/python3.6/dist-packages/airflow/cli/cli_parser.py", line 48, in command return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/airflow/utils/cli.py", line 91, in wrapper return f(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/airflow/cli/commands/task_command.py", line 212, in task_run settings.configure_orm(disable_connection_pool=True) File "/usr/local/lib/python3.6/dist-packages/airflow/settings.py", line 224, in configure_orm mask_secret(engine.url.password) File "/usr/local/lib/python3.6/dist-packages/airflow/utils/log/secrets_masker.py", line 91, in mask_secret _secrets_masker().add_mask(secret, name) File "/usr/local/lib/python3.6/dist-packages/airflow/utils/log/secrets_masker.py", line 105, in _secrets_masker raise RuntimeError("No SecretsMasker found!") RuntimeError: No SecretsMasker found! [2021-08-18 11:05:53,710: ERROR/ForkPoolWorker-3] Task airflow.executors.celery_executor.execute_command[f6a9b0cd-bb0c-414a-a51c-80579f2d2f1e] raised unexpected: AirflowException('Celery command failed on host: 64c3bc97f173',) Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/celery/app/trace.py", line 412, in trace_task R = retval = fun(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/celery/app/trace.py", line 704, in __protected_call__ return self.run(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/airflow/executors/celery_executor.py", line 88, in execute_command _execute_in_fork(command_to_exec) File "/usr/local/lib/python3.6/dist-packages/airflow/executors/celery_executor.py", line 99, in _execute_in_fork raise AirflowException('Celery command failed on host: ' + get_hostname()) airflow.exceptions.AirflowException: Celery command failed on host: 64c3bc97f173 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17399941#comment-17399941 ] ASF GitHub Bot commented on AIRFLOW-6786: - knackjason commented on pull request #7407: URL: https://github.com/apache/airflow/pull/7407#issuecomment-899735866 > But there is this PR in progress: #12388 Ah, I didn't realize there was a newer PR. Thanks, @potiuk! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor > > > Key: AIRFLOW-6786 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6786 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, hooks >Affects Versions: 1.10.9 >Reporter: Daniel Ferguson >Assignee: Daniel Ferguson >Priority: Minor > > Add the KafkaProducerHook. > Add the KafkaConsumerHook. > Add the KafkaSensor which listens to messages with a specific topic. > Related Issue: > #1311 (Pre-dates Jira Migration) > Reminder to contributors: > You must add an Apache License header to all new files > Please squash your commits when possible and follow the 7 rules of good Git > commits > I am new to the community, I am not sure the files are at the right place or > missing anything. > The sensor could be used as the first node of a dag where the second node can > be a TriggerDagRunOperator. The messages are polled in a batch and the dag > runs are dynamically generated. > Thanks! > Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], > it is important to mention these integrations are not suitable for > low-latency/high-throughput/streaming. For reference, [#1415 > (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806]. > Co-authored-by: Dan Ferguson > [dferguson...@gmail.com|mailto:dferguson...@gmail.com] > Co-authored-by: YuanfΞi Zhu -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17399939#comment-17399939 ] ASF GitHub Bot commented on AIRFLOW-6786: - potiuk commented on pull request #7407: URL: https://github.com/apache/airflow/pull/7407#issuecomment-899734636 But there is this PR in progress: https://github.com/apache/airflow/pull/12388 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor > > > Key: AIRFLOW-6786 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6786 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, hooks >Affects Versions: 1.10.9 >Reporter: Daniel Ferguson >Assignee: Daniel Ferguson >Priority: Minor > > Add the KafkaProducerHook. > Add the KafkaConsumerHook. > Add the KafkaSensor which listens to messages with a specific topic. > Related Issue: > #1311 (Pre-dates Jira Migration) > Reminder to contributors: > You must add an Apache License header to all new files > Please squash your commits when possible and follow the 7 rules of good Git > commits > I am new to the community, I am not sure the files are at the right place or > missing anything. > The sensor could be used as the first node of a dag where the second node can > be a TriggerDagRunOperator. The messages are polled in a batch and the dag > runs are dynamically generated. > Thanks! > Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], > it is important to mention these integrations are not suitable for > low-latency/high-throughput/streaming. For reference, [#1415 > (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806]. > Co-authored-by: Dan Ferguson > [dferguson...@gmail.com|mailto:dferguson...@gmail.com] > Co-authored-by: YuanfΞi Zhu -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17399937#comment-17399937 ] ASF GitHub Bot commented on AIRFLOW-6786: - potiuk commented on pull request #7407: URL: https://github.com/apache/airflow/pull/7407#issuecomment-899733940 Does not look like -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor > > > Key: AIRFLOW-6786 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6786 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, hooks >Affects Versions: 1.10.9 >Reporter: Daniel Ferguson >Assignee: Daniel Ferguson >Priority: Minor > > Add the KafkaProducerHook. > Add the KafkaConsumerHook. > Add the KafkaSensor which listens to messages with a specific topic. > Related Issue: > #1311 (Pre-dates Jira Migration) > Reminder to contributors: > You must add an Apache License header to all new files > Please squash your commits when possible and follow the 7 rules of good Git > commits > I am new to the community, I am not sure the files are at the right place or > missing anything. > The sensor could be used as the first node of a dag where the second node can > be a TriggerDagRunOperator. The messages are polled in a batch and the dag > runs are dynamically generated. > Thanks! > Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], > it is important to mention these integrations are not suitable for > low-latency/high-throughput/streaming. For reference, [#1415 > (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806]. > Co-authored-by: Dan Ferguson > [dferguson...@gmail.com|mailto:dferguson...@gmail.com] > Co-authored-by: YuanfΞi Zhu -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17399922#comment-17399922 ] ASF GitHub Bot commented on AIRFLOW-6786: - knackjason commented on pull request #7407: URL: https://github.com/apache/airflow/pull/7407#issuecomment-899724393 Did this ever get merged in? I'd be interested in having officially supported Kafka hooks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor > > > Key: AIRFLOW-6786 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6786 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, hooks >Affects Versions: 1.10.9 >Reporter: Daniel Ferguson >Assignee: Daniel Ferguson >Priority: Minor > > Add the KafkaProducerHook. > Add the KafkaConsumerHook. > Add the KafkaSensor which listens to messages with a specific topic. > Related Issue: > #1311 (Pre-dates Jira Migration) > Reminder to contributors: > You must add an Apache License header to all new files > Please squash your commits when possible and follow the 7 rules of good Git > commits > I am new to the community, I am not sure the files are at the right place or > missing anything. > The sensor could be used as the first node of a dag where the second node can > be a TriggerDagRunOperator. The messages are polled in a batch and the dag > runs are dynamically generated. > Thanks! > Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], > it is important to mention these integrations are not suitable for > low-latency/high-throughput/streaming. For reference, [#1415 > (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806]. > Co-authored-by: Dan Ferguson > [dferguson...@gmail.com|mailto:dferguson...@gmail.com] > Co-authored-by: YuanfΞi Zhu -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394452#comment-17394452 ] ASF GitHub Bot commented on AIRFLOW-4922: - gowdra01 commented on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-893961036 I am stuck with this too I am using 2.1.0 version and getting the error *** Failed to fetch log file from worker. 503 Server Error: Service Unavailable for url: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394442#comment-17394442 ] ASF GitHub Bot commented on AIRFLOW-4922: - xuemengran commented on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-893956847 In the 2.1.1 version, I tried to modify the airflow/utils/log/file_task_handler.py file to obtain the hostname information by reading the log table. I confirmed through debug that I could get the host information in this way, but a bigger problem appeared. The task is marked as successful without scheduling, and the log is still not viewable, so I confirm that to solve this problem, the host information must be written to the task_instance table before the task is executed. I think this bug is very Important, because it directly affects the use of airflow in distributed scenarios, please solve it as soon as possible!!! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393918#comment-17393918 ] ASF GitHub Bot commented on AIRFLOW-4922: - xuemengran commented on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-893100953 Hi, I am still seeing the issue in 2.1.1 version, my executor is celery -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393541#comment-17393541 ] ASF GitHub Bot commented on AIRFLOW-4922: - xuemengran commented on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-893100953 Hi, I am still seeing the issue in 2.1.1 version, my executor is celery -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files
[ https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391415#comment-17391415 ] ASF GitHub Bot commented on AIRFLOW-5703: - uranusjr commented on pull request #16792: URL: https://github.com/apache/airflow/pull/16792#issuecomment-890806062 Need to fix linter issues. See guide: https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#contribution-workflow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow - SFTP Operator for multiple files > -- > > Key: AIRFLOW-5703 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5703 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Affects Versions: 1.10.5 >Reporter: Mattia >Assignee: Andrey Kucherov >Priority: Major > > *AS* User > *I WANT TO* download / upload multiple files from sftp server > *SO THAT* i need the possibility to add a list of file instead of single one -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388861#comment-17388861 ] ASF GitHub Bot commented on AIRFLOW-4922: - YonatanKiron commented on pull request #6722: URL: https://github.com/apache/airflow/pull/6722#issuecomment-888423682 Same for us, exactly the same error as @vdusek posted -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > If a task crashes, host name is not committed to the database so logs aren't > able to be seen in the UI > -- > > Key: AIRFLOW-4922 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4922 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.3 >Reporter: Andrew Harmon >Assignee: wanghong-T >Priority: Major > > Sometimes when a task fails, the log show the following > {code} > *** Log file does not exist: > /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Fetching from: > http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** > Failed to fetch log file from worker. Invalid URL > 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host > supplied > {code} > I believe this is due to the fact that the row is not committed to the > database until after the task finishes. > https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857 -- This message was sent by Atlassian Jira (v8.3.4#803005)