[jira] [Commented] (AIRFLOW-3905) Allow using parameters for sql statement in SqlSensor

2022-06-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552690#comment-17552690
 ] 

ASF GitHub Bot commented on AIRFLOW-3905:
-

malthe commented on PR #4723:
URL: https://github.com/apache/airflow/pull/4723#issuecomment-1152249671

   Shouldn't the `parameters` be templated?




> Allow using parameters for sql statement in SqlSensor
> -
>
> Key: AIRFLOW-3905
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3905
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 1.10.2
>Reporter: Xiaodong Deng
>Assignee: Xiaodong Deng
>Priority: Minor
> Fix For: 1.10.3
>
>
> In most SQL-related operators/sensors, argument `parameters` is available to 
> help render SQL command conveniently. But this is not available in SqlSensor 
> yet.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-05-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17540543#comment-17540543
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

potiuk commented on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1133832625

   > > @turbaszek Let me make a PR later~ We are doing pressure tests these 
days and this problem had appeared often.
   > 
   > Hey turbaszek, Any chance to have PR submitted, we are experiencing in 
2.3.0 as well.
   
   I think you wanted to call @ghostbody who wanted to submi the fix @vanducng .




> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-05-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17540527#comment-17540527
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

vanducng commented on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1133817633

   > @turbaszek Let me make a PR later~ We are doing pressure tests these days 
and this problem had appeared often.
   
Hey turbaszek, Any chance to have PR submitted, we are experiencing in 
2.3.0 as well.




> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AIRFLOW-778) Metastore Partition Sensor Broken

2022-04-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528879#comment-17528879
 ] 

ASF GitHub Bot commented on AIRFLOW-778:


deprosun commented on PR #2005:
URL: https://github.com/apache/airflow/pull/2005#issuecomment-209271

   Has to be 
   ```python
   MetastorePartitionSensor(
   dag=dag,
   task_id="t",
   schema="some_schema",
   table="some_table",
   partition_name=(
   "visit_date=some_date" # without quotes
   ),
   sql="", # have to provide an empty string due to inheritance issue
   # conn_id and mysql_conn_id have to be duplicated 
   conn_id="hive_conn",
   mysql_conn_id="hive_conn",
   soft_fail=True,
   mode="reschedule",
   poke_interval=60,
   timeout=300,
   )
   ```




> Metastore Partition Sensor Broken
> -
>
> Key: AIRFLOW-778
> URL: https://issues.apache.org/jira/browse/AIRFLOW-778
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Reporter: Dan Davydov
>Assignee: Dan Davydov
>Priority: Blocker
>
> MetastorePartitionSensor always throws an exception on initialization due to 
> 72cc8b3006576153aa30d27643807b4ae5dfb593 .
> Looks like the tests for this are only run if an explicit flag is set which 
> is how this got past CI.
> cc [~xuanji]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AIRFLOW-778) Metastore Partition Sensor Broken

2022-04-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527795#comment-17527795
 ] 

ASF GitHub Bot commented on AIRFLOW-778:


deprosun commented on PR #2005:
URL: https://github.com/apache/airflow/pull/2005#issuecomment-1109120353

   I got this error when I tried to use the `MetastorePartitionSensor`
   ```
 File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/sensors/sql.py", line 
72, in _get_hook
   conn = BaseHook.get_connection(self.conn_id)
 File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/hooks/base.py", line 
68, in get_connection
   conn = Connection.get_connection_from_secrets(conn_id)
 File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/models/connection.py",
 line 410, in get_connection_from_secrets
   raise AirflowNotFoundException(f"The conn_id `{conn_id}` isn't defined")
   airflow.exceptions.AirflowNotFoundException: The conn_id `` isn't defined
   ```
   
   
   What is the right way of using this Sensor?  I am not sure why I am getting 
the above exception.  This is how I am using it
   
   ```python
   MetastorePartitionSensor(
   dag=dag,
   task_id="t",
   schema="some_schema",
   table="some_table",
   partition_name=(
   "visit_date='some_date'"
   ),
   mysql_conn_id="hive_prod_conn",
   soft_fail=True,
   mode="reschedule",
   poke_interval=60,
   timeout=300,
   )
   ```




> Metastore Partition Sensor Broken
> -
>
> Key: AIRFLOW-778
> URL: https://issues.apache.org/jira/browse/AIRFLOW-778
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Reporter: Dan Davydov
>Assignee: Dan Davydov
>Priority: Blocker
>
> MetastorePartitionSensor always throws an exception on initialization due to 
> 72cc8b3006576153aa30d27643807b4ae5dfb593 .
> Looks like the tests for this are only run if an explicit flag is set which 
> is how this got past CI.
> cc [~xuanji]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AIRFLOW-3537) Allow AWS ECS Operator to use templates in task_definition parameter

2022-04-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523163#comment-17523163
 ] 

ASF GitHub Bot commented on AIRFLOW-3537:
-

potiuk commented on PR #4341:
URL: https://github.com/apache/airflow/pull/4341#issuecomment-1100727118

   You take a look at other PRs people added with template fields. You will 
find a number of those If you search.




> Allow AWS ECS Operator to use templates in task_definition parameter
> 
>
> Key: AIRFLOW-3537
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3537
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: aws
>Reporter: tomoya tabata
>Assignee: tomoya tabata
>Priority: Minor
>
> The AWS ECS operator does not currently apply templates to the 
> task_definition parameter.
> I'd like to allow AWS ECS Operator to use templates in task_definition 
> parameter.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-3537) Allow AWS ECS Operator to use templates in task_definition parameter

2022-04-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523162#comment-17523162
 ] 

ASF GitHub Bot commented on AIRFLOW-3537:
-

potiuk commented on PR #4341:
URL: https://github.com/apache/airflow/pull/4341#issuecomment-1100726790

   Just add the field. It's enough.




> Allow AWS ECS Operator to use templates in task_definition parameter
> 
>
> Key: AIRFLOW-3537
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3537
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: aws
>Reporter: tomoya tabata
>Assignee: tomoya tabata
>Priority: Minor
>
> The AWS ECS operator does not currently apply templates to the 
> task_definition parameter.
> I'd like to allow AWS ECS Operator to use templates in task_definition 
> parameter.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-3537) Allow AWS ECS Operator to use templates in task_definition parameter

2022-04-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523158#comment-17523158
 ] 

ASF GitHub Bot commented on AIRFLOW-3537:
-

FridayPush commented on PR #4341:
URL: https://github.com/apache/airflow/pull/4341#issuecomment-1100718236

   I appreciate the invitation @potiuk, generally this PR is what I would 
produce to make the change. I would likely have removed the `template_fields` 
test as it is a value check. Additionally the [BaseOperator already has 
tests](https://github.com/apache/airflow/blob/main/tests/models/test_baseoperator.py)
 that validate the behavior of changes to the `template_fields`
   
   Per the original PR review the only template rendering test I can find in 
other [AWS Operators is the 'batch' 
operator](https://github.com/apache/airflow/blob/main/tests/providers/amazon/aws/operators/test_batch.py#L100).
   
   If I was to submit a new PR with the changes to the template_fields how 
should tests be handled? Would you lean towards removing the test or adding the 
per field test referenced by mik-laj?




> Allow AWS ECS Operator to use templates in task_definition parameter
> 
>
> Key: AIRFLOW-3537
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3537
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: aws
>Reporter: tomoya tabata
>Assignee: tomoya tabata
>Priority: Minor
>
> The AWS ECS operator does not currently apply templates to the 
> task_definition parameter.
> I'd like to allow AWS ECS Operator to use templates in task_definition 
> parameter.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-3537) Allow AWS ECS Operator to use templates in task_definition parameter

2022-04-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523038#comment-17523038
 ] 

ASF GitHub Bot commented on AIRFLOW-3537:
-

potiuk commented on PR #4341:
URL: https://github.com/apache/airflow/pull/4341#issuecomment-1100598087

   > Wish this had been merged. A pain point for us that's one field name added 
to a tuple.
   
   Feel free to contribute it as a PR. This is an open-source project with more 
than 2000 contributors - you are absolutely welcome to contribute such change.




> Allow AWS ECS Operator to use templates in task_definition parameter
> 
>
> Key: AIRFLOW-3537
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3537
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: aws
>Reporter: tomoya tabata
>Assignee: tomoya tabata
>Priority: Minor
>
> The AWS ECS operator does not currently apply templates to the 
> task_definition parameter.
> I'd like to allow AWS ECS Operator to use templates in task_definition 
> parameter.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-3537) Allow AWS ECS Operator to use templates in task_definition parameter

2022-04-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523004#comment-17523004
 ] 

ASF GitHub Bot commented on AIRFLOW-3537:
-

FridayPush commented on PR #4341:
URL: https://github.com/apache/airflow/pull/4341#issuecomment-1100510274

   Wish this had been merged. A pain point for us that's one field name added 
to a tuple. 




> Allow AWS ECS Operator to use templates in task_definition parameter
> 
>
> Key: AIRFLOW-3537
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3537
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: aws
>Reporter: tomoya tabata
>Assignee: tomoya tabata
>Priority: Minor
>
> The AWS ECS operator does not currently apply templates to the 
> task_definition parameter.
> I'd like to allow AWS ECS Operator to use templates in task_definition 
> parameter.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-04-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17520659#comment-17520659
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

woodywuuu commented on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1095231809

   airflow: 2.2.2 with mysql8、 HA scheduler、celery executor(redis backend)
   
   From logs, it show that those ti reported this error `killed externally 
(status: success)` , were rescheduled! 
   1. scheduler found a ti to scheduled (ti from None to scheduled)
   2. scheduler queued ti(ti from scheduled to queued)
   3. scheduler send ti to celery
   4. worker get ti
   5. worker found ti‘s state in mysql  is scheduled 
https://github.com/apache/airflow/blob/2.2.2/airflow/models/taskinstance.py#L1224
   6. worker set this ti to None
   7. scheduler reschedule this ti
   8. scheduler could not queue this ti again, and found this ti success(in 
celery), so set it to failed
   
   From mysql we get that: all failed task has no external_executor_id!
   
   We use 5000 dags, each with 50 dummy task, found that, if the following two 
conditions are met,the probability of triggering this problem will highly 
increase:
   
   1. no external_executor_id was set to queued ti in celery 
https://github.com/apache/airflow/blob/2.2.2/airflow/jobs/scheduler_job.py#L537
  * This sql above has skip_locked, and some queued ti in celery may miss 
this external_executor_id. 
   10. a scheduler loop cost very long(more than 60s), 
`adopt_or_reset_orphaned_tasks` judge that schedulerJob failed, and try adopt 
orphaned ti 
https://github.com/apache/airflow/blob/9ac742885ffb83c15f7e3dc910b0cf9df073407a/airflow/executors/celery_executor.py#L442
   
   We do these tests:
   1. patch `SchedulerJob. _process_executor_events `, not to set 
external_executor_id to those queued ti
  * 300+ dag failed with `killed externally (status: success)` normally 
less than 10
   2. patch `adopt_or_reset_orphaned_tasks`, not to adopt orphaned ti 
  * no dag failed !
   
   I read the notes 
[below](https://github.com/apache/airflow/blob/9ac742885ffb83c15f7e3dc910b0cf9df073407a/airflow/executors/celery_executor.py#L442)
 , but still don't understand this problems:
   1. why should we handle queued ti in celery and set this external id ?




> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-03-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17507936#comment-17507936
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

kenny813x201 commented on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1069820469


   We also got the same error message. In our case, it turns out that we are 
using the same name for different dags.
   Changing different dags from `as dag` to like `as dags1` and `as dags2` 
solve the issue for us.
   ```
   with DAG(
   "dag_name",
   ) as dag:
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-1746) Add a Nomad operator to trigger job from Airflow

2022-03-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17506680#comment-17506680
 ] 

ASF GitHub Bot commented on AIRFLOW-1746:
-

shantanugadgil commented on pull request #2708:
URL: https://github.com/apache/airflow/pull/2708#issuecomment-1067538261


   Is this still planned to be merged?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add a Nomad operator to trigger job from Airflow
> 
>
> Key: AIRFLOW-1746
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1746
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib
>Reporter: Eyal Trabelsi
>Assignee: Eyal Trabelsi
>Priority: Major
>
> We recently face the need to trigger nomad jobs from Airflow and no operator 
> are available for that. 
> The way the operator works is to register a nomad job and dispatch the job , 
> than check the status of the job using similar method like boto-core 
> (https://github.com/boto/botocore/blob/5a07b477114b11e6dc5f676f5db810972565b113/botocore/docs/waiter.py)
> The operator uses https://github.com/jrxFive/python-nomad which is a wrap 
> over nomad rest api of nomad written in python.
> Link to the PR : https://github.com/apache/incubator-airflow/pull/2708



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-03-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17506060#comment-17506060
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

aakashanand92 edited a comment on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1066506022


   > We face the same issue with tasks that stay indefinitely in a queued 
status, except that we don't see tasks as `up_for_retry`. It happens randomly 
within our DAGs. The task will stay in a queued status forever until we 
manually make it fail. We **don't use any sensors** at all. We are on an AWS 
MWAA instance (Airflow 2.0.2).
   > 
   > Example logs: Scheduler:
   > 
   > ```
   > [2022-01-14 08:03:32,868] {{scheduler_job.py:1239}} ERROR - Executor 
reports task instance  
finished (failed) although the task says its queued. (Info: None) Was the task 
killed externally?
   > [2022-01-14 08:03:32,845] {{scheduler_job.py:1210}} INFO - Executor 
reports execution of task0 execution_date=2022-01-13 07:00:00+00:00 exited with 
status failed for try_number 1
   >  in state FAILURE
   > ```
   > 
   > Worker:
   > 
   > ```
   > [2021-04-20 20:54:29,109: ERROR/ForkPoolWorker-15] Failed to execute task 
dag_id could not be found: task0. Either the dag did not exist or it failed to 
parse..`
   > This is not seen in the worker logs for every occurrence in the scheduler 
logs.
   > ```
   > 
   > Because of the MWAA autoscaling mechanism, `worker_concurrency` is not 
configurable. `worker_autoscale`: `10, 10`. `dagbag_import_timeout`: 120s 
`dag_file_processor_timeout`: 50s `parallelism` = 48 `dag_concurrency` = 1 
`max_threads` = 8
   > 
   > We currently have 2 (minWorkers) to 10 (maxWorkers) mw1.medium (2 vCPU) 
workers.
   
   @val2k Did you find a solution for this ? I am also using MWAA environment 
and facing the same issue.
   
   The tasks get stuck in queued state and when I look at the scheduler logs I 
can see the same error.
   
   "Executor reports task instance %s finished (%s) although the task says its 
%s. (Info: %s) Was the task killed externally?"
   
   I tried everything I can find in this thread but nothing seems to be working.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-03-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17506059#comment-17506059
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

aakashanand92 edited a comment on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1066506022


   > We face the same issue with tasks that stay indefinitely in a queued 
status, except that we don't see tasks as `up_for_retry`. It happens randomly 
within our DAGs. The task will stay in a queued status forever until we 
manually make it fail. We **don't use any sensors** at all. We are on an AWS 
MWAA instance (Airflow 2.0.2).
   > 
   > Example logs: Scheduler:
   > 
   > ```
   > [2022-01-14 08:03:32,868] {{scheduler_job.py:1239}} ERROR - Executor 
reports task instance  
finished (failed) although the task says its queued. (Info: None) Was the task 
killed externally?
   > [2022-01-14 08:03:32,845] {{scheduler_job.py:1210}} INFO - Executor 
reports execution of task0 execution_date=2022-01-13 07:00:00+00:00 exited with 
status failed for try_number 1
   >  in state FAILURE
   > ```
   > 
   > Worker:
   > 
   > ```
   > [2021-04-20 20:54:29,109: ERROR/ForkPoolWorker-15] Failed to execute task 
dag_id could not be found: task0. Either the dag did not exist or it failed to 
parse..`
   > This is not seen in the worker logs for every occurrence in the scheduler 
logs.
   > ```
   > 
   > Because of the MWAA autoscaling mechanism, `worker_concurrency` is not 
configurable. `worker_autoscale`: `10, 10`. `dagbag_import_timeout`: 120s 
`dag_file_processor_timeout`: 50s `parallelism` = 48 `dag_concurrency` = 1 
`max_threads` = 8
   > 
   > We currently have 2 (minWorkers) to 10 (maxWorkers) mw1.medium (2 vCPU) 
workers.
   
   Did you find a solution for this ? I am also using MWAA environment and 
facing the same issue.
   
   The tasks get stuck in queued state and when I look at the scheduler logs I 
can see the same error.
   
   "Executor reports task instance %s finished (%s) although the task says its 
%s. (Info: %s) Was the task killed externally?"
   
   I tried everything I can find in this thread but nothing seems to be working.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-03-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17506058#comment-17506058
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

aakashanand92 commented on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1066506022


   > We face the same issue with tasks that stay indefinitely in a queued 
status, except that we don't see tasks as `up_for_retry`. It happens randomly 
within our DAGs. The task will stay in a queued status forever until we 
manually make it fail. We **don't use any sensors** at all. We are on an AWS 
MWAA instance (Airflow 2.0.2).
   > 
   > Example logs: Scheduler:
   > 
   > ```
   > [2022-01-14 08:03:32,868] {{scheduler_job.py:1239}} ERROR - Executor 
reports task instance  
finished (failed) although the task says its queued. (Info: None) Was the task 
killed externally?
   > [2022-01-14 08:03:32,845] {{scheduler_job.py:1210}} INFO - Executor 
reports execution of task0 execution_date=2022-01-13 07:00:00+00:00 exited with 
status failed for try_number 1
   >  in state FAILURE
   > ```
   > 
   > Worker:
   > 
   > ```
   > [2021-04-20 20:54:29,109: ERROR/ForkPoolWorker-15] Failed to execute task 
dag_id could not be found: task0. Either the dag did not exist or it failed to 
parse..`
   > This is not seen in the worker logs for every occurrence in the scheduler 
logs.
   > ```
   > 
   > Because of the MWAA autoscaling mechanism, `worker_concurrency` is not 
configurable. `worker_autoscale`: `10, 10`. `dagbag_import_timeout`: 120s 
`dag_file_processor_timeout`: 50s `parallelism` = 48 `dag_concurrency` = 1 
`max_threads` = 8
   > 
   > We currently have 2 (minWorkers) to 10 (maxWorkers) mw1.medium (2 vCPU) 
workers.
   
   Did you find a solution for this ? I am also using MWAA environment and 
facing the same issue.
   
   The tasks get stuck in queued state and when I look at the scheduler logs I 
can see the same error.
   
   "Executor reports task instance %s finished (%s) although the task says its 
%s. (Info: %s) Was the task killed externally?"
   
   I tried to everything I can find in this thread but nothing seems to be 
working.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-03-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17506007#comment-17506007
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

ghostbody commented on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1066406484


   @turbaszek Let me make a PR later~ We are doing pressure tests these days 
and this problem had appeared often.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-03-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505135#comment-17505135
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

turbaszek commented on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1065613233


   @ghostbody do you have idea how this can be addressed? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-03-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501136#comment-17501136
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

ghostbody commented on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1058801920


   After STRUGLING, We found a method to 100% reproduce this issue !!!
   
   
   tl;dr 
   
   
https://github.com/apache/airflow/blob/9ac742885ffb83c15f7e3dc910b0cf9df073407a/airflow/models/taskinstance.py#L1253
   
   Add a `raise` to simulate  db error which will likely happen when the DB is 
under great pressure.
   
   Then you will get this issue `Was the task killed externally` in all the 
time.
   
   Conditions:
   
   - Airflow 2.2
   - Celery Executor
   
   It's becasue the worker use a local task job which will spwan a child 
process to execute the job. The parent process set the task from `Queued` to 
`Running` State. However, when the prepare work for the parent process failed, 
it will lead to this error directly.
   
   related code is here: 
https://github.com/apache/airflow/blob/2.2.2/airflow/jobs/local_task_job.py#L89
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-6405) Bigquery Update Table Properties Operator

2022-03-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500879#comment-17500879
 ] 

ASF GitHub Bot commented on AIRFLOW-6405:
-

rolldeep commented on pull request #7126:
URL: https://github.com/apache/airflow/pull/7126#issuecomment-1058249946


   Hi @jithin-sukumar , thanks for your operator!
   
   Unfortunately, I can't figure out how to use it. Currently I want to upsert 
delta table into main table. There is no parameters for the destination (main) 
table.
   
   Here is my operator:
   ```python
   upsert_table = BigQueryUpsertTableOperator(
   task_id=f"upsert_table",
   dataset_id='DATASET_NAME',
   table_resource={
   "tableReference": {"tableId": f" 
{config.get('TABLE_NAME')}"},
   "expirationTime": (int(time.time()) + 300) * 1000,
   },
   )
   ```
   The problem is that I can't choose the destination (main) table. 
@jithin-sukumar can you explain how can I set up my destination table? As I 
see, current implementation uses tableReference for both source and 
destination. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Bigquery Update Table Properties Operator
> -
>
> Key: AIRFLOW-6405
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6405
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: gcp, operators
>Affects Versions: 1.10.7
>Reporter: Jithin Sukumar
>Assignee: Jithin Sukumar
>Priority: Minor
> Fix For: 2.0.0
>
>
> Currently, Airflow doesn't seem to support BigQuery update table operations 
> [1] (specifically to update the properties of a table). Is this already under 
> development?
> (The use case is conditionally updating the `expiration time` of BQ tables.)
>  
> References:
> [1]:  [https://cloud.google.com/bigquery/docs/managing-tables]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-02-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495037#comment-17495037
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

omoumniabdou commented on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1046086644


   The problem for us was that we had one dag that reach 32 parallelize 
runnable task ( 32 leaf tasks) which was the value of parameter `parallelism`. 
After this, the scheduler was not able to run (or queue) any task.
   Increasing this parameter solve the problem for us.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-02-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17494898#comment-17494898
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

ghostbody commented on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1045871873


   @pbotros No, we do not solve this problem yet.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-02-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17493003#comment-17493003
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

pbotros commented on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1041101221


   We also run into this fairly often, despite not using any sensors. We only 
seemed to start getting this error once we changed our Airflow database to be 
in the cloud (AWS RDB); our Airflow webserver & scheduler runs on desktop 
workstations on-premises. As others have suggested in this thread, this is a 
very annoying problem that requires manual intervention.
   
   @ghostbody any progress on determining if that's the correct root cause?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-6602) Make "executor_config" templated field to support dynamic parameters for kubernetes executor

2022-02-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489071#comment-17489071
 ] 

ASF GitHub Bot commented on AIRFLOW-6602:
-

BayanAzima commented on pull request #7230:
URL: https://github.com/apache/airflow/pull/7230#issuecomment-1032967030


   I'd like to see this feature as I'd like to template out a `sub_path` of a 
PV I have so that i don't expose other dag run data I store there to other dag 
runs. see the example below
   
   I notice that this issue is pretty old, was this added in v2 or is there 
another way I can do this now? 
   
   ``` 
   
   KUBERNETES_WORKSPACE_PVC = {
   "pod_override": k8s.V1Pod(
   spec=k8s.V1PodSpec(
   containers=[
   k8s.V1Container(
   name="base",
   volume_mounts=[
   k8s.V1VolumeMount(
   name=Constants.KUBERNETES_WORKSPACE_PVC,
   mount_path='/opt/airflow/mnt/workspace', 
   # sub_path="{{ 
ti.xcom_pull(key='run_group_id')}}", 
   read_only=False
   )
   ],
   )
   ],
   volumes=[
   k8s.V1Volume(
   name=Constants.KUBERNETES_WORKSPACE_PVC,
   
persistent_volume_claim=k8s.V1PersistentVolumeClaimVolumeSource(claim_name=Constants.KUBERNETE_WORKSPACE_PVC)
   )
   ],
   )
   ),
   }


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make "executor_config" templated field to support dynamic parameters for 
> kubernetes executor
> 
>
> Key: AIRFLOW-6602
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6602
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: executor-kubernetes, executors
>Affects Versions: 1.10.7
>Reporter: Jun Xie
>Assignee: Jun Xie
>Priority: Major
>
> When running airflow with Kubernetes Executor, one specifies the 
> configurations through 
> "executor_config". At the moment, this field is not templated, meaning that 
> we won't be able to have dynamic parameters. So I did an experiment that I 
> created MyPythonOperator which inherits PythonOperator but with with 
> "executor_config" added to template_fields. However, the result shows that 
> this change itself isn't enough, because airflow first creates a Pod based on 
> executor_config without rendering it, and then run the task inside the pod 
> (the running will trigger the Jinja template rendering)
> See an example config below showing a use case where one can mount dynamic 
> "subPath" to the image
>  
> {code:java}
> executor_config = {
> "KubernetesExecutor": {
> "image": "some_image",
> "request_memory": "2Gi",
> 'request_cpu': '1',
> "volumes": [
> {
> "name": "data",
> "persistentVolumeClaim": {"claimName": "some_claim_name"},
> },
> ],
> "volume_mounts": [
> {
> "mountPath": "/code",
> "name": "data",
> "subPath": "/code/{{ dag_run.conf['branch_name'] }}"
> },
> ]
> }
> }
> {code}
>  
>  
>  
> I have then did a further experiment that in 
> trigger_tasks() from airflow/executors/base_executor.py, right before 
> execute_async() is called, I called simple_ti.render_templates() which will 
> trigger the rendering, so the kubernetes_executor.execute_async() will pick 
> up the resolved parameters
>  
> {code:java}
> # current behavior
> for i in range(min((open_slots, len(self.queued_tasks:
> key, (command, _, queue, simple_ti) = sorted_queue.pop(0)
> self.queued_tasks.pop(key)
> self.running[key] = command
> self.execute_async(key=key,
>command=command,
>queue=queue,
>executor_config=simple_ti.executor_config)
> {code}
>  
>  
> {code:java}
> # Proposed new behavior:
> for i in range(min((open_slots, len(self.queued_tasks:
> key, (command, _, queue, simple_ti) = sorted_queue.pop(0)
> self.queued_tasks.pop(key)
> self.running[key] = command
> simple_ti.render_templates()  # render it
> self.execute_async(key=key,
>command=command,
>queue=queue,
>

[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476036#comment-17476036
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

val2k edited a comment on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1012934214


   We face the same issue with tasks that stay indefinitely in a queued status, 
except that we don't see tasks as `up_for_retry`. It happens randomly within 
our DAGs. The task will stay in a queued status forever until we manually make 
it fail. We **don't use any sensors** at all. We are on an AWS MWAA instance 
(Airflow 2.0.2).
   
   Example logs:
   Scheduler:
   ```
   [2022-01-14 08:03:32,868] {{scheduler_job.py:1239}} ERROR - Executor reports 
task instance  finished 
(failed) although the task says its queued. (Info: None) Was the task killed 
externally?
   [2022-01-14 08:03:32,845] {{scheduler_job.py:1210}} INFO - Executor reports 
execution of task0 execution_date=2022-01-13 07:00:00+00:00 exited with status 
failed for try_number 1
in state FAILURE
   ```
   
   Worker:
   ```
   [2021-04-20 20:54:29,109: ERROR/ForkPoolWorker-15] Failed to execute task 
dag_id could not be found: task0. Either the dag did not exist or it failed to 
parse..`
   This is not seen in the worker logs for every occurrence in the scheduler 
logs.
   ```
   
   Because of the MWAA autoscaling mechanism, `worker_concurrency` is not 
configurable.
   `worker_autoscale`: `10, 10`.
   `dagbag_import_timeout`: 120s
   `dag_file_processor_timeout`: 50s
   `parallelism` = 48
   `dag_concurrency` = 1
   `max_threads` = 8
   
   We currently have 2 (minWorkers) to 10 (maxWorkers) mw1.medium (2 vCPU) 
workers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476035#comment-17476035
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

val2k commented on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1012934214


   We face the same issue with tasks that stay indefinitely in a queued status, 
except that we don't see tasks as `up_for_retry`. It happens randomly within 
our DAGs. The task will stay in a queued status forever until we manually make 
it fail. We **don't use any sensors** at all. We are on an AWS MWAA instance 
(Airflow 2.0.2).
   
   Example logs:
   Scheduler:
   ```
   [2022-01-14 08:03:32,868] {{scheduler_job.py:1239}} ERROR - Executor reports 
task instance  finished 
(failed) although the task says its queued. (Info: None) Was the task killed 
externally?
   [2022-01-14 08:03:32,845] {{scheduler_job.py:1210}} INFO - Executor reports 
execution of task0 execution_date=2022-01-13 07:00:00+00:00 exited with status 
failed for try_number 1
in state FAILURE
   ```
   
   Worker:
   `[2021-04-20 20:54:29,109: ERROR/ForkPoolWorker-15] Failed to execute task 
dag_id could not be found: task0. Either the dag did not exist or it failed to 
parse..`
   This is not seen in the worker logs for every occurrence in the scheduler 
logs.
   
   Because of the MWAA autoscaling mechanism, `worker_concurrency` is not 
configurable.
   `worker_autoscale`: `10, 10`.
   `dagbag_import_timeout`: 120s
   `dag_file_processor_timeout`: 50s
   `parallelism` = 48
   `dag_concurrency` = 1
   `max_threads` = 8
   
   We currently have 2 (minWorkers) to 10 (maxWorkers) mw1.medium (2 vCPU) 
workers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-01-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475121#comment-17475121
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

ghostbody edited a comment on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804


   we reviewed the code and found that in `local_task_job.py`, the parent 
process has a `heatbeat_callback`, and will check the state and child process 
return code of the `task_instance`.
   
   However, theses lines may cover a bug?
   
   
![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png)
   
   
![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png)
   
   
   **The raw task command write back the taskintance's state(like sucess) 
doesn't mean the child process is finished(returned)?**
   
   So, in this heatbeat callback, there maybe a race condition when task state 
is filled back while the child process is not returned.
   
   In this senario, the local task will kill the child process by mistake. And 
then, the scheduler will checkout this and report "task instance X finished 
(success) although the task says its queued. Was the task killed externally?"
   
   this is a simple schematic diagram:
   
   
![image](https://user-images.githubusercontent.com/8371330/149273573-45700f32-079b-4b22-8dba-d6a1ce37a243.png)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-01-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475117#comment-17475117
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

ghostbody edited a comment on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804


   we reviewed the code and found that in `local_task_job.py`, the parent 
process has a `heatbeat_callback`, and will check the state and child process 
return code of the `task_instance`.
   
   However, theses lines may cover a bug?
   
   
![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png)
   
   
![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png)
   
   
   **The raw task command write back the taskintance's state(like sucess) 
doesn't mean the child process is finished(returned)?**
   
   So, in this heatbeat callback, there maybe a race condition when task state 
is filled back while the child process is not returned.
   
   In this senario, the local task will kill the child process by mistake. And 
then, the scheduler will checkout this and report "task instance X finished 
(success) although the task says its queued. Was the task killed externally?"
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-01-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475116#comment-17475116
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

ghostbody edited a comment on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804


   we reviewed the code and found that in `local_task_job.py`, the parent 
process has a `heatbeat_callback`, and will check the state and child process 
return code of the `task_instance`.
   
   However, theses lines may cover a bug?
   
   
![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png)
   
   
![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png)
   
   
   **The raw task command write back the taskintance's state(like sucess) 
doesn't mean the child process is finished(returned)?**
   
   So, in this heatbeat callback, there maybe a race condition when task state 
is filled back while the child process is not returned.
   
   In this senario, the local task will kill the child process by mistake.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-01-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475115#comment-17475115
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

ghostbody commented on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804


   we reviewed the code and found that in `local_task_job.py`, the parent 
process has a `heatbeat_callback`, and will check the state and child process 
return code of the `task_instance`.
   
   However, theses lines may cover a bug?
   
   
![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png)
   
   
![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png)
   
   
   **The raw task command write back the taskintance's state(like sucess) 
doesn't mean the child process is finished(returned)?**
   
   So, in this heatbeat callback, there maybe a race condition when task state 
is filled back while the child process is not returned.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2022-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17467689#comment-17467689
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

derkuci commented on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-1003763219


   @xuemengran nvm.  I guess you meant the PR suggested here.  I tried that; 
the log table has changed, and I couldn't match the information easily.  The 
`execution_date` column is None when `event="cli_task_run"`, which makes 
filtering impossible.
   
   I understand why the PR was rejected.  For cases where the logs exist but 
the web UI couldn't locate the correct hostname, the issue is that the 
"task_instance" table only stores the latest `try_number`/`hostname` for a task 
run (as already indicated by @ITriangle).  The PK doesn't include `try_number`. 
 It's better to fix the task_instance table, which is more fundamental, and 
probably would intimidate most "amateurs" (like me).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2022-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17467682#comment-17467682
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

derkuci commented on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-1003754230


   > In the 2.1.1 version, I tried to modify the 
airflow/utils/log/file_task_handler.py file to obtain the hostname information 
by reading the log table. I confirmed through debug that I could get the host 
information in this way, 
   
   @xuemengran could you kindly point to how this could be done?
   
   With airflow 2.2.2 + Celery, I am seeing error messages like below due to 
`TaskInstance.hostname` being always the latest and not relying on the 
`try_number`.
   ```
   "Failed to fetch log file from worker. Client error '404 NOT FOUND' for url 
..."
   ```
   If we try really hard, the logs can be found from the local storage of 
_some_ celery workers.  But that is a huge burden for operational and/or 
debugging.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor

2021-12-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454196#comment-17454196
 ] 

ASF GitHub Bot commented on AIRFLOW-6786:
-

potiuk commented on pull request #12388:
URL: https://github.com/apache/airflow/pull/12388#issuecomment-987086352


   > @potiuk I have a same requirement. If I am able to implement it. Will 
raise a PR.
   
   Cool!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
> 
>
> Key: AIRFLOW-6786
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6786
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, hooks
>Affects Versions: 1.10.9
>Reporter: Daniel Ferguson
>Assignee: Daniel Ferguson
>Priority: Minor
>
> Add the KafkaProducerHook.
>  Add the KafkaConsumerHook.
>  Add the KafkaSensor which listens to messages with a specific topic.
>  Related Issue:
>  #1311 (Pre-dates Jira Migration)
> Reminder to contributors:
> You must add an Apache License header to all new files
>  Please squash your commits when possible and follow the 7 rules of good Git 
> commits
>  I am new to the community, I am not sure the files are at the right place or 
> missing anything.
> The sensor could be used as the first node of a dag where the second node can 
> be a TriggerDagRunOperator. The messages are polled in a batch and the dag 
> runs are dynamically generated.
> Thanks!
> Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], 
> it is important to mention these integrations are not suitable for 
> low-latency/high-throughput/streaming. For reference, [#1415 
> (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806].
> Co-authored-by: Dan Ferguson 
> [dferguson...@gmail.com|mailto:dferguson...@gmail.com]
>  Co-authored-by: YuanfΞi Zhu



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor

2021-12-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454186#comment-17454186
 ] 

ASF GitHub Bot commented on AIRFLOW-6786:
-

debashis-das commented on pull request #12388:
URL: https://github.com/apache/airflow/pull/12388#issuecomment-987056630


   @potiuk I have a same requirement. If I am able to implement it. Will raise 
a PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
> 
>
> Key: AIRFLOW-6786
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6786
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, hooks
>Affects Versions: 1.10.9
>Reporter: Daniel Ferguson
>Assignee: Daniel Ferguson
>Priority: Minor
>
> Add the KafkaProducerHook.
>  Add the KafkaConsumerHook.
>  Add the KafkaSensor which listens to messages with a specific topic.
>  Related Issue:
>  #1311 (Pre-dates Jira Migration)
> Reminder to contributors:
> You must add an Apache License header to all new files
>  Please squash your commits when possible and follow the 7 rules of good Git 
> commits
>  I am new to the community, I am not sure the files are at the right place or 
> missing anything.
> The sensor could be used as the first node of a dag where the second node can 
> be a TriggerDagRunOperator. The messages are polled in a batch and the dag 
> runs are dynamically generated.
> Thanks!
> Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], 
> it is important to mention these integrations are not suitable for 
> low-latency/high-throughput/streaming. For reference, [#1415 
> (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806].
> Co-authored-by: Dan Ferguson 
> [dferguson...@gmail.com|mailto:dferguson...@gmail.com]
>  Co-authored-by: YuanfΞi Zhu



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-12-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453706#comment-17453706
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

potiuk edited a comment on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-986311760


   Well. @blotouta2. I think this issue discusses 3 or 4 different problems. So 
your comment is pretty meaningless. Also the issue is clsoed so it's likely 
it's a different issue altogether.
   
   If you REALLLY want to get help, just open a new issue and provide all the 
details you can (and ideally a reproducible case). Or if you do not have 
reproducible case  - provide as much information as you can and open a Github 
Discussion
   
   And BTW. if you are using an older version of Airflow, just upgrade to the 
newest and check - Airlfow follows SemVer, so it should be rather safe to 
upgrade.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-12-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453705#comment-17453705
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

potiuk commented on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-986311760


   Well. @blotouta2. I think this issue discusses 3 or 4 different. Problems. 
The issue is clsoed so it's likely it's a different issue altogether.
   
   If you REALLLY want to get help, just open a new issue and provide all the 
details you can (and ideally a reproducible case). Or if you do not have 
reproducible case  - provide as much information as you can and open a Github 
Discussion
   
   And BTW. if you are using an older version of Airflow, just upgrade to the 
newest and check - Airlfow follows SemVer, so it should be rather safe to 
upgrade.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-12-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453287#comment-17453287
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

blotouta2 commented on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-985965125


   I have also got the same exception, was this issue fixed in any version ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor

2021-12-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452606#comment-17452606
 ] 

ASF GitHub Bot commented on AIRFLOW-6786:
-

potiuk commented on pull request #12388:
URL: https://github.com/apache/airflow/pull/12388#issuecomment-984991868


   unbelivable (!) you have not done it yet @serge-salamanka-1pt !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
> 
>
> Key: AIRFLOW-6786
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6786
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, hooks
>Affects Versions: 1.10.9
>Reporter: Daniel Ferguson
>Assignee: Daniel Ferguson
>Priority: Minor
>
> Add the KafkaProducerHook.
>  Add the KafkaConsumerHook.
>  Add the KafkaSensor which listens to messages with a specific topic.
>  Related Issue:
>  #1311 (Pre-dates Jira Migration)
> Reminder to contributors:
> You must add an Apache License header to all new files
>  Please squash your commits when possible and follow the 7 rules of good Git 
> commits
>  I am new to the community, I am not sure the files are at the right place or 
> missing anything.
> The sensor could be used as the first node of a dag where the second node can 
> be a TriggerDagRunOperator. The messages are polled in a batch and the dag 
> runs are dynamically generated.
> Thanks!
> Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], 
> it is important to mention these integrations are not suitable for 
> low-latency/high-throughput/streaming. For reference, [#1415 
> (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806].
> Co-authored-by: Dan Ferguson 
> [dferguson...@gmail.com|mailto:dferguson...@gmail.com]
>  Co-authored-by: YuanfΞi Zhu



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor

2021-12-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452605#comment-17452605
 ] 

ASF GitHub Bot commented on AIRFLOW-6786:
-

potiuk commented on pull request #12388:
URL: https://github.com/apache/airflow/pull/12388#issuecomment-984991544


   @serge-salamanka-1pt - maybe you would like to contribute it ? Airflow is 
created by >1800 contributors and you can become one and add Kafka support! The 
OSS world works this way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
> 
>
> Key: AIRFLOW-6786
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6786
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, hooks
>Affects Versions: 1.10.9
>Reporter: Daniel Ferguson
>Assignee: Daniel Ferguson
>Priority: Minor
>
> Add the KafkaProducerHook.
>  Add the KafkaConsumerHook.
>  Add the KafkaSensor which listens to messages with a specific topic.
>  Related Issue:
>  #1311 (Pre-dates Jira Migration)
> Reminder to contributors:
> You must add an Apache License header to all new files
>  Please squash your commits when possible and follow the 7 rules of good Git 
> commits
>  I am new to the community, I am not sure the files are at the right place or 
> missing anything.
> The sensor could be used as the first node of a dag where the second node can 
> be a TriggerDagRunOperator. The messages are polled in a batch and the dag 
> runs are dynamically generated.
> Thanks!
> Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], 
> it is important to mention these integrations are not suitable for 
> low-latency/high-throughput/streaming. For reference, [#1415 
> (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806].
> Co-authored-by: Dan Ferguson 
> [dferguson...@gmail.com|mailto:dferguson...@gmail.com]
>  Co-authored-by: YuanfΞi Zhu



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor

2021-12-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452280#comment-17452280
 ] 

ASF GitHub Bot commented on AIRFLOW-6786:
-

serge-salamanka-1pt commented on pull request #12388:
URL: https://github.com/apache/airflow/pull/12388#issuecomment-984473014


   unbelievable (!) Ariflow does not support Kafka out of the box yet !?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
> 
>
> Key: AIRFLOW-6786
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6786
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, hooks
>Affects Versions: 1.10.9
>Reporter: Daniel Ferguson
>Assignee: Daniel Ferguson
>Priority: Minor
>
> Add the KafkaProducerHook.
>  Add the KafkaConsumerHook.
>  Add the KafkaSensor which listens to messages with a specific topic.
>  Related Issue:
>  #1311 (Pre-dates Jira Migration)
> Reminder to contributors:
> You must add an Apache License header to all new files
>  Please squash your commits when possible and follow the 7 rules of good Git 
> commits
>  I am new to the community, I am not sure the files are at the right place or 
> missing anything.
> The sensor could be used as the first node of a dag where the second node can 
> be a TriggerDagRunOperator. The messages are polled in a batch and the dag 
> runs are dynamically generated.
> Thanks!
> Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], 
> it is important to mention these integrations are not suitable for 
> low-latency/high-throughput/streaming. For reference, [#1415 
> (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806].
> Co-authored-by: Dan Ferguson 
> [dferguson...@gmail.com|mailto:dferguson...@gmail.com]
>  Co-authored-by: YuanfΞi Zhu



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-12-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451785#comment-17451785
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

danilocurvelo commented on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-983607427


   Exactly the same error as @vdusek posted in Airflow 2.1.x. Was this fixed in 
Airflow 2.2.x?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-11-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17450537#comment-17450537
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

taohuzefu edited a comment on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-981768485


   > Hi, I am still seeing the issue in 2.1.1 version, my executor is celery
   
   @xuemengran
   Hi, how about now? Did you fix that? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-11-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17450536#comment-17450536
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

taohuzefu commented on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-981768485


   > Hi, I am still seeing the issue in 2.1.1 version, my executor is celery
   
   Hi, how about now? Did you fix that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-3702) Reverse Backfilling(from current date to start date)

2021-11-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17441053#comment-17441053
 ] 

ASF GitHub Bot commented on AIRFLOW-3702:
-

aashayVimeo commented on pull request #4676:
URL: https://github.com/apache/airflow/pull/4676#issuecomment-964018648


   hi @dima-asana @feng-tao Any updates on the above comment for running 
backwards dag from dag definition.
   
   ```
   from airflow import DAG
   dag = DAG(
   dag_id = "...",
   scheduled_interval = "@daily",
   run_backwards = True,
   ...
   )
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Reverse Backfilling(from current date to start date)
> 
>
> Key: AIRFLOW-3702
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3702
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: configuration, DAG, models
>Affects Versions: 1.10.1
> Environment: MacOS High Sierra
> python2.7
> Airflow-1.10.1
>Reporter: Shubham Gupta
>Assignee: Dima Kamalov
>Priority: Major
>  Labels: critical, improvement, priority
> Fix For: 1.10.3
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Hello,
> I think there is a need to have reverse backfilling option as well because 
> recent jobs would take precedence over the historical jobs. We can come up 
> with some variable in the DAG such as dagrun_order_default = True/False . 
> This would help in many use cases, in which previous date pipeline does not 
> depends on current pipeline.
> I saw this page which talks about this -> 
> http://mail-archives.apache.org/mod_mbox/airflow-dev/201804.mbox/%3CCAPUwX3M7_qrn=1bqysmkdv_ifjbta6lbtq7czhhexszmdjk...@mail.gmail.com%3E
> Thanks!
> Regards,
> Shubham



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-11-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17438722#comment-17438722
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

zhengxianh removed a comment on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-957984427


   Hi, I'm trying to set up Airflow 2.1.4 with docker on multiple machines, but 
failed when the webserver is not able to access the task logs. Can anyone help 
me out please?
   
   I've created a post in stackoverflow with details:
   
https://stackoverflow.com/questions/68694805/airflow-webserver-not-able-to-access-remote-worker-logs
   
   If you need any other info, please let me know.
   
   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-11-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17437616#comment-17437616
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

zhengxianh commented on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-957984427


   Hi, I'm trying to set up Airflow 2.1.4 with docker on multiple machines, but 
failed when the webserver is not able to access the task logs. Can anyone help 
me out please?
   
   I've created a post in stackoverflow with details:
   
https://stackoverflow.com/questions/68694805/airflow-webserver-not-able-to-access-remote-worker-logs
   
   If you need any other info, please let me know.
   
   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-11-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17437528#comment-17437528
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

zhengxianh commented on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-957984427


   Hi, I'm trying to set up Airflow 2.1.4 with docker on multiple machines, but 
failed when the webserver is not able to access the task logs. Can anyone help 
me out please?
   
   I've created a post in stackoverflow with details:
   
https://stackoverflow.com/questions/68694805/airflow-webserver-not-able-to-access-remote-worker-logs
   
   If you need any other info, please let me know.
   
   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5911) Simplify lineage support and improve robustness

2021-10-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17433558#comment-17433558
 ] 

ASF GitHub Bot commented on AIRFLOW-5911:
-

alokjain-01 commented on pull request #6564:
URL: https://github.com/apache/airflow/pull/6564#issuecomment-950526422


   is this problem of lineage fixed in Airflow 2.0 onward. I am using 2.1.4 and 
still not getting any lineage in Apache atlas. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Simplify lineage support and improve robustness
> ---
>
> Key: AIRFLOW-5911
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5911
> Project: Apache Airflow
>  Issue Type: Sub-task
>  Components: lineage
>Affects Versions: 1.10.6
>Reporter: Bolke de Bruin
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2021-10-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432763#comment-17432763
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

nguyenmphu edited a comment on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-949189802


   I found that in the code of `airflow/jobs/scheduler_job.py`: 
https://github.com/apache/airflow/blob/main/airflow/jobs/scheduler_job.py#L535
   ``` python
  if ti.try_number == buffer_key.try_number and ti.state == 
State.QUEUED:
   Stats.incr('scheduler.tasks.killed_externally')
   msg = (
   "Executor reports task instance %s finished (%s) 
although the "
   "task says its %s. (Info: %s) Was the task killed 
externally?"
   )
   self.log.error(msg, ti, state, ti.state, info)
   ```
   The scheduler checks the state of the task instance. When a task instance is 
rescheduled (e.g: an external sensor), its state transition up_for_reschedule 
-> scheduled -> queued -> running. If its state is queued and not moved to the 
running state, the scheduler will raise an error.
   So I think the code needs to be changed:
   ``` python
  if ti.try_number == buffer_key.try_number and (
   ti.state == State.QUEUED and not 
TaskReschedule.find_for_task_instance(ti, session=session)
   ):
   Stats.incr('scheduler.tasks.killed_externally')
   msg = (
   "Executor reports task instance %s finished (%s) 
although the "
   "task says its %s. (Info: %s) Was the task killed 
externally?"
   )
   self.log.error(msg, ti, state, ti.state, info)
   ```
   Here is my PR: https://github.com/apache/airflow/pull/19123


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2021-10-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432762#comment-17432762
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

nguyenmphu commented on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-949189802


   I found that in the code of `airflow/jobs/scheduler_job.py`: 
https://github.com/apache/airflow/blob/main/airflow/jobs/scheduler_job.py#L535
   ``` python
  if ti.try_number == buffer_key.try_number and ti.state == 
State.QUEUED:
   Stats.incr('scheduler.tasks.killed_externally')
   msg = (
   "Executor reports task instance %s finished (%s) 
although the "
   "task says its %s. (Info: %s) Was the task killed 
externally?"
   )
   self.log.error(msg, ti, state, ti.state, info)
   ```
   The scheduler checks the state of the task instance. When a task instance is 
rescheduled (e.g: an external sensor), its state transition up_for_reschedule 
-> scheduled -> queued -> running. If its state is queued and not moved to the 
running state, the scheduler will raise an error.
   So I think the code needs to change:
   ``` python
  if ti.try_number == buffer_key.try_number and (
   ti.state == State.QUEUED and not 
TaskReschedule.find_for_task_instance(ti, session=session)
   ):
   Stats.incr('scheduler.tasks.killed_externally')
   msg = (
   "Executor reports task instance %s finished (%s) 
although the "
   "task says its %s. (Info: %s) Was the task killed 
externally?"
   )
   self.log.error(msg, ti, state, ti.state, info)
   ```
   Here is my PR: https://github.com/apache/airflow/pull/19123


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2021-10-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17430385#comment-17430385
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

jledru-redoute commented on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-946459118


   Hello,
   Still on version 1.10.12 managed by cloud composer but we are intending to 
move quite quickly to airflow 2. 
   But it seems that this issue is not really resolved on the version 2.
   We are experiencing this issue not every day, but quite often and always on 
the same dags. Those dags are dynamically generated by the same python file in 
airflow based on conf files scanning. It took generally around 12s to parse, so 
I don't think this is the issue.  It looks like this : 
   ```
   for country in DAG_PARAMS['countries']:
   
   for audience_type in AUDIENCES_TYPE:
   
   # get audiences conf file to generate the dags
   conf_files = glob.glob(
   f"/home/airflow/gcs/data/CAM/{ country['country_code'] 
}/COMPOSER_PARAM_SOURCES/{ audience_type['type'] }/*")
   
   audiences_list = []
   
   for conf_file in conf_files:
   
   string_conf = open(conf_file, 'rb').read().decode("UTF-8")
   audiences_list.append(json.loads(string_conf))
   
   for letter in ascii_uppercase:
   dag_aud_list = [
   aud for aud in audiences_list if aud["CATEG_CODE"][0] == 
letter]
   
   if dag_aud_list:
   dag = create_dag(audience_type, country, dag_aud_list)
   globals()[
   f"{ audience_type['type'] }_{ country['country_code'] 
}_{ letter }_dag"] = dag
   ```
   I understand it is not quite recommanded (however what is preco for this 
type of DAG) but that's the way it is done.
   It generates for now around 10 dags with approx 35 init sensor in reschedule 
mode every 20 minutes.
   Worker machine is n1-standard-4 set with worker_concurrency at 24. 
   Therefore yesterday on 35 celerys task set to be reschedule, 32 of them were 
rescheduled on the same worker (there are 3 workers) at quite the same time 
(I'm not sure how to see of the worker concurrency was respected or not but I 
doubt it) causing 17 of them to fail with this specific issue ...
   If I understand, set worker_autoscale to "4,2" (and keeping 
worker_concurrency to 24) would resolve the issue ? 
   Thanks,


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3739) Make start_date optional

2021-09-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417296#comment-17417296
 ] 

ASF GitHub Bot commented on AIRFLOW-3739:
-

BasPH commented on pull request #5423:
URL: https://github.com/apache/airflow/pull/5423#issuecomment-922438443


   It's been a while but I can create a new PR -- still think it's valuable for 
DAGs that are triggered only manually.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make start_date optional
> 
>
> Key: AIRFLOW-3739
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3739
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 1.10.3
>Reporter: Anatoli Babenia
>Priority: Major
>
> I want to define DAG, but not schedule it for running.
> ```
> from airflow import DAG
> from airflow.operators.bash_operator import BashOperator
> dag = DAG('115', schedule_interval="@daily")
> seed = BashOperator(
>     task_id='get_seed',
>     bash_command='date'
> )
> dag >> seed
> ```
> This fails with the error below.
> ```
> airflow.exceptions.AirflowException: Task is missing the start_date parameter
> zsh returned exit code 1
> ```
> It it possible to make `start_date` optional. If not, why?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3739) Make start_date optional

2021-09-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417295#comment-17417295
 ] 

ASF GitHub Bot commented on AIRFLOW-3739:
-

potiuk edited a comment on pull request #5423:
URL: https://github.com/apache/airflow/pull/5423#issuecomment-922436740


   The repository for that PR has been deleted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make start_date optional
> 
>
> Key: AIRFLOW-3739
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3739
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 1.10.3
>Reporter: Anatoli Babenia
>Priority: Major
>
> I want to define DAG, but not schedule it for running.
> ```
> from airflow import DAG
> from airflow.operators.bash_operator import BashOperator
> dag = DAG('115', schedule_interval="@daily")
> seed = BashOperator(
>     task_id='get_seed',
>     bash_command='date'
> )
> dag >> seed
> ```
> This fails with the error below.
> ```
> airflow.exceptions.AirflowException: Task is missing the start_date parameter
> zsh returned exit code 1
> ```
> It it possible to make `start_date` optional. If not, why?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3739) Make start_date optional

2021-09-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417294#comment-17417294
 ] 

ASF GitHub Bot commented on AIRFLOW-3739:
-

potiuk commented on pull request #5423:
URL: https://github.com/apache/airflow/pull/5423#issuecomment-922436740


   The repository for that project has been deleted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make start_date optional
> 
>
> Key: AIRFLOW-3739
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3739
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 1.10.3
>Reporter: Anatoli Babenia
>Priority: Major
>
> I want to define DAG, but not schedule it for running.
> ```
> from airflow import DAG
> from airflow.operators.bash_operator import BashOperator
> dag = DAG('115', schedule_interval="@daily")
> seed = BashOperator(
>     task_id='get_seed',
>     bash_command='date'
> )
> dag >> seed
> ```
> This fails with the error below.
> ```
> airflow.exceptions.AirflowException: Task is missing the start_date parameter
> zsh returned exit code 1
> ```
> It it possible to make `start_date` optional. If not, why?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3739) Make start_date optional

2021-09-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417286#comment-17417286
 ] 

ASF GitHub Bot commented on AIRFLOW-3739:
-

abitrolly commented on pull request #5423:
URL: https://github.com/apache/airflow/pull/5423#issuecomment-922424760


   @potiuk why it is closed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make start_date optional
> 
>
> Key: AIRFLOW-3739
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3739
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 1.10.3
>Reporter: Anatoli Babenia
>Priority: Major
>
> I want to define DAG, but not schedule it for running.
> ```
> from airflow import DAG
> from airflow.operators.bash_operator import BashOperator
> dag = DAG('115', schedule_interval="@daily")
> seed = BashOperator(
>     task_id='get_seed',
>     bash_command='date'
> )
> dag >> seed
> ```
> This fails with the error below.
> ```
> airflow.exceptions.AirflowException: Task is missing the start_date parameter
> zsh returned exit code 1
> ```
> It it possible to make `start_date` optional. If not, why?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3739) Make start_date optional

2021-09-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417229#comment-17417229
 ] 

ASF GitHub Bot commented on AIRFLOW-3739:
-

potiuk closed pull request #5423:
URL: https://github.com/apache/airflow/pull/5423


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make start_date optional
> 
>
> Key: AIRFLOW-3739
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3739
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 1.10.3
>Reporter: Anatoli Babenia
>Priority: Major
>
> I want to define DAG, but not schedule it for running.
> ```
> from airflow import DAG
> from airflow.operators.bash_operator import BashOperator
> dag = DAG('115', schedule_interval="@daily")
> seed = BashOperator(
>     task_id='get_seed',
>     bash_command='date'
> )
> dag >> seed
> ```
> This fails with the error below.
> ```
> airflow.exceptions.AirflowException: Task is missing the start_date parameter
> zsh returned exit code 1
> ```
> It it possible to make `start_date` optional. If not, why?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6544) Log_id is not missing when writing logs to elastic search

2021-09-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417230#comment-17417230
 ] 

ASF GitHub Bot commented on AIRFLOW-6544:
-

potiuk closed pull request #7141:
URL: https://github.com/apache/airflow/pull/7141


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Log_id is not missing when writing logs to elastic search
> -
>
> Key: AIRFLOW-6544
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6544
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.7
>Reporter: Larry Zhu
>Assignee: Larry Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The “end of log” marker does not include the aforementioned log_id. The issue 
> is then airflow-web does not know when to stop tailing the logs.
> Also it would be better to include an elasticsearch configuration for index 
> name so that the search is more efficient in big clusters with a lot of 
> indices



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4374) Inherit from Enum

2021-09-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417228#comment-17417228
 ] 

ASF GitHub Bot commented on AIRFLOW-4374:
-

potiuk closed pull request #5302:
URL: https://github.com/apache/airflow/pull/5302


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Inherit from Enum
> -
>
> Key: AIRFLOW-4374
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4374
> Project: Apache Airflow
>  Issue Type: Sub-task
>  Components: core
>Reporter: Bas Harenslak
>Assignee: Bas Harenslak
>Priority: Minor
>
> Python 3.4 introduced the Enum type, which can reduce the complexity of a few 
> enum-style classes. E.g. the TriggerRule could become:
> {code:java}
> from enum import Enum
> class TriggerRule(Enum):
> ALL_SUCCESS = "all_success"
> ALL_FAILED = "all_failed"
> ALL_DONE = "all_done"
> ONE_SUCCESS = "one_success"
> ONE_FAILED = "one_failed"
> NONE_FAILED = "none_failed"
> NONE_SKIPPED = "none_skipped"
> DUMMY = "dummy"
> @classmethod
> def all_triggers(cls):
> return [tr.name for tr in TriggerRule]
> {code}
>  A quick scan showed me enum-like class are TriggerRule, WeightRule, -State-, 
> but there might be more, so search first.
> Also, note this is optional and not required for running in Python 3.
>  
> Edit: not touching the State in this one, since state contains additional 
> attributes (e.g. state_color) and in Python Enum all attributes are part of 
> the Enum class, which we don't want in this case. Will have to rethink this 
> one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6334) Change CommandType in executors to be a dictionary

2021-09-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417227#comment-17417227
 ] 

ASF GitHub Bot commented on AIRFLOW-6334:
-

potiuk closed pull request #7085:
URL: https://github.com/apache/airflow/pull/7085


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Change CommandType in executors to be a dictionary
> --
>
> Key: AIRFLOW-6334
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6334
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: executors
>Affects Versions: 2.0.0
>Reporter: Tomasz Urbaszek
>Priority: Major
>
> Most of the executors run tasks by running 'airflow tasks run ...`. This is 
> achieved by passing  ['airflow', 'tasks', 'run', ...] object to 
> subprocess.check_call. I would love to abandon this limiting type and instead 
> use a dictionary. 
> Then such dictionary could be passed to new method that will run the task 
> without using subprocess.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4541) Replace mkdirs usage with pathlib

2021-09-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417226#comment-17417226
 ] 

ASF GitHub Bot commented on AIRFLOW-4541:
-

potiuk closed pull request #5304:
URL: https://github.com/apache/airflow/pull/5304


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Replace mkdirs usage with pathlib
> -
>
> Key: AIRFLOW-4541
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4541
> Project: Apache Airflow
>  Issue Type: Sub-task
>  Components: core
>Reporter: Jarek Potiuk
>Assignee: Bas Harenslak
>Priority: Major
>
> _makedirs is used in 'airlfow.utils.file.mkdirs'  - it could be replaced with 
> pathlib now with python3.5+_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-139) Executing VACUUM with PostgresOperator

2021-09-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408173#comment-17408173
 ] 

ASF GitHub Bot commented on AIRFLOW-139:


HansBambel commented on pull request #1821:
URL: https://github.com/apache/airflow/pull/1821#issuecomment-910291067


   If anybody (like me) still faced this issue: you need to add 
`autocommit=True` to the `PostgresOperator`:
   ```
   vacuum = PostgresOperator(
   task_id="vacuum-database",
   postgres_conn_id=db_conn_id,
   sql="VACUUM FULL;",
   autocommit=True,
   )
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Executing VACUUM with PostgresOperator
> --
>
> Key: AIRFLOW-139
> URL: https://issues.apache.org/jira/browse/AIRFLOW-139
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.7.0
>Reporter: Rafael
>Priority: Major
> Fix For: 1.8.1
>
>
> Dear Airflow Maintainers,
> h1. Environment
> * Airflow version: *v1.7.0*
> * Airflow components: *PostgresOperator*
> * Python Version: *Python 3.5.1*
> * Operating System: *15.4.0 Darwin*
> h1. Description of Issue
> I am trying to execute a `VACUUM` command as part of DAG with the 
> `PostgresOperator`, which fails with the following error:
> {quote}
> [2016-05-14 16:14:01,849] {__init__.py:36} INFO - Using executor 
> SequentialExecutor
> Traceback (most recent call last):
>   File "/usr/local/bin/airflow", line 15, in 
> args.func(args)
>   File 
> "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/airflow/bin/cli.py",
>  line 203, in run
> pool=args.pool,
>   File 
> "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/airflow/models.py",
>  line 1067, in run
> result = task_copy.execute(context=context)
>   File 
> "/usr/local/lib/python3.5/site-packages/airflow/operators/postgres_operator.py",
>  line 39, in execute
> self.hook.run(self.sql, self.autocommit, parameters=self.parameters)
>   File 
> "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/airflow/hooks/dbapi_hook.py",
>  line 109, in run
> cur.execute(s)
> psycopg2.InternalError: VACUUM cannot run inside a transaction block
> {quote}
> I could create a small python script that performs the operation, as 
> explained in [this stackoverflow 
> entry](http://stackoverflow.com/questions/1017463/postgresql-how-to-run-vacuum-from-code-outside-transaction-block).
>  However, I would like to know first if the `VACUUM` command should be 
> supported by the `PostgresOperator`.
> h1. Reproducing the Issue
> The operator can be declared as follows:
> {quote}
> conn = ('postgres_default')
> t4 = PostgresOperator(
> task_id='vacuum',
> postgres_conn_id=conn,
> sql=("VACUUM public.table"),
> dag=dag
> )
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor

2021-09-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408056#comment-17408056
 ] 

ASF GitHub Bot commented on AIRFLOW-6786:
-

github-actions[bot] closed pull request #12388:
URL: https://github.com/apache/airflow/pull/12388


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
> 
>
> Key: AIRFLOW-6786
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6786
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, hooks
>Affects Versions: 1.10.9
>Reporter: Daniel Ferguson
>Assignee: Daniel Ferguson
>Priority: Minor
>
> Add the KafkaProducerHook.
>  Add the KafkaConsumerHook.
>  Add the KafkaSensor which listens to messages with a specific topic.
>  Related Issue:
>  #1311 (Pre-dates Jira Migration)
> Reminder to contributors:
> You must add an Apache License header to all new files
>  Please squash your commits when possible and follow the 7 rules of good Git 
> commits
>  I am new to the community, I am not sure the files are at the right place or 
> missing anything.
> The sensor could be used as the first node of a dag where the second node can 
> be a TriggerDagRunOperator. The messages are polled in a batch and the dag 
> runs are dynamically generated.
> Thanks!
> Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], 
> it is important to mention these integrations are not suitable for 
> low-latency/high-throughput/streaming. For reference, [#1415 
> (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806].
> Co-authored-by: Dan Ferguson 
> [dferguson...@gmail.com|mailto:dferguson...@gmail.com]
>  Co-authored-by: YuanfΞi Zhu



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-1424) Make the next execution date of DAGs visible

2021-09-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407991#comment-17407991
 ] 

ASF GitHub Bot commented on AIRFLOW-1424:
-

bbovenzi closed pull request #2460:
URL: https://github.com/apache/airflow/pull/2460


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make the next execution date of DAGs visible
> 
>
> Key: AIRFLOW-1424
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1424
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: DAG
>Reporter: Ultrabug
>Priority: Major
>
> The scheduler's DAG run creation logic can be tricky and one is
> easily confused with the start_date + interval
> and period end scheduling way of thinking.
> It would ease airflow's usage to add a next execution field to DAGs
> so that we can very easily see the (un)famous period end after which
> the scheduler will create a new DAG run for our workflows.
> These patches are a simple way to implement this on the DAG model
> and make use of this in the interface.
> Screenshots are on the github PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor

2021-08-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407730#comment-17407730
 ] 

ASF GitHub Bot commented on AIRFLOW-6786:
-

github-actions[bot] closed pull request #12388:
URL: https://github.com/apache/airflow/pull/12388


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
> 
>
> Key: AIRFLOW-6786
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6786
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, hooks
>Affects Versions: 1.10.9
>Reporter: Daniel Ferguson
>Assignee: Daniel Ferguson
>Priority: Minor
>
> Add the KafkaProducerHook.
>  Add the KafkaConsumerHook.
>  Add the KafkaSensor which listens to messages with a specific topic.
>  Related Issue:
>  #1311 (Pre-dates Jira Migration)
> Reminder to contributors:
> You must add an Apache License header to all new files
>  Please squash your commits when possible and follow the 7 rules of good Git 
> commits
>  I am new to the community, I am not sure the files are at the right place or 
> missing anything.
> The sensor could be used as the first node of a dag where the second node can 
> be a TriggerDagRunOperator. The messages are polled in a batch and the dag 
> runs are dynamically generated.
> Thanks!
> Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], 
> it is important to mention these integrations are not suitable for 
> low-latency/high-throughput/streaming. For reference, [#1415 
> (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806].
> Co-authored-by: Dan Ferguson 
> [dferguson...@gmail.com|mailto:dferguson...@gmail.com]
>  Co-authored-by: YuanfΞi Zhu



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-1424) Make the next execution date of DAGs visible

2021-08-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407723#comment-17407723
 ] 

ASF GitHub Bot commented on AIRFLOW-1424:
-

bbovenzi closed pull request #2460:
URL: https://github.com/apache/airflow/pull/2460


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make the next execution date of DAGs visible
> 
>
> Key: AIRFLOW-1424
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1424
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: DAG
>Reporter: Ultrabug
>Priority: Major
>
> The scheduler's DAG run creation logic can be tricky and one is
> easily confused with the start_date + interval
> and period end scheduling way of thinking.
> It would ease airflow's usage to add a next execution field to DAGs
> so that we can very easily see the (un)famous period end after which
> the scheduler will create a new DAG run for our workflows.
> These patches are a simple way to implement this on the DAG model
> and make use of this in the interface.
> Screenshots are on the github PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files

2021-08-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407441#comment-17407441
 ] 

ASF GitHub Bot commented on AIRFLOW-5703:
-

AngryHelper commented on a change in pull request #16792:
URL: https://github.com/apache/airflow/pull/16792#discussion_r698559013



##
File path: airflow/providers/sftp/operators/sftp.py
##
@@ -88,6 +124,9 @@ def __init__(
 remote_host=None,
 local_filepath=None,
 remote_filepath=None,
+local_folder=None,
+remote_folder=None,
+regexp_mask=None,

Review comment:
   for this case will print AirflowException with examples
   
https://github.com/apache/airflow/pull/16792/files#diff-911f39c07d851bc2f0a2958674ed940ddc68aea6ecc7d71c29567816c182eaf6R216
   
   I checked the example of dag again, I am comfortable working with one 
operetor, even if he has combined arguments.
   
   Your idea i understand too, may be we will wait @potiuk and hear his opinion

##
File path: airflow/providers/sftp/operators/sftp.py
##
@@ -134,29 +176,81 @@ def execute(self, context: Any) -> str:
 
 with self.ssh_hook.get_conn() as ssh_client:
 sftp_client = ssh_client.open_sftp()
-if self.operation.lower() == SFTPOperation.GET:
-local_folder = os.path.dirname(self.local_filepath)
-if self.create_intermediate_dirs:
-Path(local_folder).mkdir(parents=True, exist_ok=True)
-file_msg = f"from {self.remote_filepath} to 
{self.local_filepath}"
-self.log.info("Starting to transfer %s", file_msg)
-sftp_client.get(self.remote_filepath, self.local_filepath)
-else:
-remote_folder = os.path.dirname(self.remote_filepath)
-if self.create_intermediate_dirs:
-_make_intermediate_dirs(
-sftp_client=sftp_client,
-remote_directory=remote_folder,
+if self.local_filepath and self.remote_filepath:
+if isinstance(self.local_filepath, list) and 
isinstance(self.remote_filepath, str):
+for file_path in self.local_filepath:
+local_folder = os.path.dirname(file_path)
+local_file = os.path.basename(file_path)
+file_msg = file_path
+self._transfer(sftp_client, local_folder, 
local_file, self.remote_filepath)
+elif isinstance(self.remote_filepath, list) and 
isinstance(self.local_filepath, str):
+for file_path in self.remote_filepath:
+remote_folder = os.path.dirname(file_path)
+remote_file = os.path.basename(file_path)
+file_msg = file_path
+self._transfer(sftp_client, self.local_filepath, 
remote_file, remote_folder)
+elif isinstance(self.remote_filepath, str) and 
isinstance(self.local_filepath, str):
+local_folder = os.path.dirname(self.local_filepath)
+file_msg = self.local_filepath
+self._transfer(
+sftp_client,
+local_folder,
+self.local_filepath,
+self.remote_filepath,
+only_file=True,
 )
-file_msg = f"from {self.local_filepath} to 
{self.remote_filepath}"
-self.log.info("Starting to transfer file %s", file_msg)
-sftp_client.put(self.local_filepath, self.remote_filepath, 
confirm=self.confirm)
+elif self.local_folder and self.remote_folder:
+if self.operation.lower() == SFTPOperation.PUT:
+files_list = 
self._search_files(os.listdir(self.local_folder))
+for file in files_list:
+local_file = os.path.basename(file)
+file_msg = file
+self._transfer(sftp_client, self.local_folder, 
local_file, self.remote_folder)
+elif self.operation.lower() == SFTPOperation.GET:
+files_list = 
self._search_files(sftp_client.listdir(self.remote_folder))
+for file in files_list:
+remote_file = ntpath.basename(file)

Review comment:
   yep, sorry, fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 

[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files

2021-08-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407434#comment-17407434
 ] 

ASF GitHub Bot commented on AIRFLOW-5703:
-

AngryHelper commented on pull request #16792:
URL: https://github.com/apache/airflow/pull/16792#issuecomment-908647425


   sorry for flood, I want create another PR with true commits


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow - SFTP Operator for multiple files
> --
>
> Key: AIRFLOW-5703
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5703
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 1.10.5
>Reporter: Mattia
>Assignee: Andrey Kucherov
>Priority: Major
>
> *AS* User
> *I WANT TO* download / upload multiple files from sftp server
> *SO THAT*  i need the possibility to add a list of file instead of single one



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files

2021-08-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407425#comment-17407425
 ] 

ASF GitHub Bot commented on AIRFLOW-5703:
-

potiuk commented on pull request #16792:
URL: https://github.com/apache/airflow/pull/16792#issuecomment-908509451


   Can you please rebase to latest `main` @AngryHelper ? This will fix the 
failing image build, 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow - SFTP Operator for multiple files
> --
>
> Key: AIRFLOW-5703
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5703
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 1.10.5
>Reporter: Mattia
>Assignee: Andrey Kucherov
>Priority: Major
>
> *AS* User
> *I WANT TO* download / upload multiple files from sftp server
> *SO THAT*  i need the possibility to add a list of file instead of single one



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files

2021-08-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407408#comment-17407408
 ] 

ASF GitHub Bot commented on AIRFLOW-5703:
-

AngryHelper closed pull request #16792:
URL: https://github.com/apache/airflow/pull/16792


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow - SFTP Operator for multiple files
> --
>
> Key: AIRFLOW-5703
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5703
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 1.10.5
>Reporter: Mattia
>Assignee: Andrey Kucherov
>Priority: Major
>
> *AS* User
> *I WANT TO* download / upload multiple files from sftp server
> *SO THAT*  i need the possibility to add a list of file instead of single one



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files

2021-08-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406919#comment-17406919
 ] 

ASF GitHub Bot commented on AIRFLOW-5703:
-

AngryHelper commented on pull request #16792:
URL: https://github.com/apache/airflow/pull/16792#issuecomment-908647425


   sorry for flood, I want create another PR with true commits


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow - SFTP Operator for multiple files
> --
>
> Key: AIRFLOW-5703
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5703
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 1.10.5
>Reporter: Mattia
>Assignee: Andrey Kucherov
>Priority: Major
>
> *AS* User
> *I WANT TO* download / upload multiple files from sftp server
> *SO THAT*  i need the possibility to add a list of file instead of single one



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files

2021-08-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406918#comment-17406918
 ] 

ASF GitHub Bot commented on AIRFLOW-5703:
-

AngryHelper closed pull request #16792:
URL: https://github.com/apache/airflow/pull/16792


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow - SFTP Operator for multiple files
> --
>
> Key: AIRFLOW-5703
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5703
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 1.10.5
>Reporter: Mattia
>Assignee: Andrey Kucherov
>Priority: Major
>
> *AS* User
> *I WANT TO* download / upload multiple files from sftp server
> *SO THAT*  i need the possibility to add a list of file instead of single one



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files

2021-08-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406833#comment-17406833
 ] 

ASF GitHub Bot commented on AIRFLOW-5703:
-

potiuk commented on pull request #16792:
URL: https://github.com/apache/airflow/pull/16792#issuecomment-908509451


   Can you please rebase to latest `main` @AngryHelper ? This will fix the 
failing image build, 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow - SFTP Operator for multiple files
> --
>
> Key: AIRFLOW-5703
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5703
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 1.10.5
>Reporter: Mattia
>Assignee: Andrey Kucherov
>Priority: Major
>
> *AS* User
> *I WANT TO* download / upload multiple files from sftp server
> *SO THAT*  i need the possibility to add a list of file instead of single one



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files

2021-08-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406786#comment-17406786
 ] 

ASF GitHub Bot commented on AIRFLOW-5703:
-

AngryHelper commented on a change in pull request #16792:
URL: https://github.com/apache/airflow/pull/16792#discussion_r698591513



##
File path: airflow/providers/sftp/operators/sftp.py
##
@@ -134,29 +176,81 @@ def execute(self, context: Any) -> str:
 
 with self.ssh_hook.get_conn() as ssh_client:
 sftp_client = ssh_client.open_sftp()
-if self.operation.lower() == SFTPOperation.GET:
-local_folder = os.path.dirname(self.local_filepath)
-if self.create_intermediate_dirs:
-Path(local_folder).mkdir(parents=True, exist_ok=True)
-file_msg = f"from {self.remote_filepath} to 
{self.local_filepath}"
-self.log.info("Starting to transfer %s", file_msg)
-sftp_client.get(self.remote_filepath, self.local_filepath)
-else:
-remote_folder = os.path.dirname(self.remote_filepath)
-if self.create_intermediate_dirs:
-_make_intermediate_dirs(
-sftp_client=sftp_client,
-remote_directory=remote_folder,
+if self.local_filepath and self.remote_filepath:
+if isinstance(self.local_filepath, list) and 
isinstance(self.remote_filepath, str):
+for file_path in self.local_filepath:
+local_folder = os.path.dirname(file_path)
+local_file = os.path.basename(file_path)
+file_msg = file_path
+self._transfer(sftp_client, local_folder, 
local_file, self.remote_filepath)
+elif isinstance(self.remote_filepath, list) and 
isinstance(self.local_filepath, str):
+for file_path in self.remote_filepath:
+remote_folder = os.path.dirname(file_path)
+remote_file = os.path.basename(file_path)
+file_msg = file_path
+self._transfer(sftp_client, self.local_filepath, 
remote_file, remote_folder)
+elif isinstance(self.remote_filepath, str) and 
isinstance(self.local_filepath, str):
+local_folder = os.path.dirname(self.local_filepath)
+file_msg = self.local_filepath
+self._transfer(
+sftp_client,
+local_folder,
+self.local_filepath,
+self.remote_filepath,
+only_file=True,
 )
-file_msg = f"from {self.local_filepath} to 
{self.remote_filepath}"
-self.log.info("Starting to transfer file %s", file_msg)
-sftp_client.put(self.local_filepath, self.remote_filepath, 
confirm=self.confirm)
+elif self.local_folder and self.remote_folder:
+if self.operation.lower() == SFTPOperation.PUT:
+files_list = 
self._search_files(os.listdir(self.local_folder))
+for file in files_list:
+local_file = os.path.basename(file)
+file_msg = file
+self._transfer(sftp_client, self.local_folder, 
local_file, self.remote_folder)
+elif self.operation.lower() == SFTPOperation.GET:
+files_list = 
self._search_files(sftp_client.listdir(self.remote_folder))
+for file in files_list:
+remote_file = ntpath.basename(file)

Review comment:
   yep, sorry, fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow - SFTP Operator for multiple files
> --
>
> Key: AIRFLOW-5703
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5703
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 1.10.5
>Reporter: Mattia
>Assignee: Andrey Kucherov
>Priority: Major
>
> *AS* User
> *I WANT TO* download / upload multiple files from sftp server
> *SO THAT*  i need the possibility to add a list of file instead of 

[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files

2021-08-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406766#comment-17406766
 ] 

ASF GitHub Bot commented on AIRFLOW-5703:
-

AngryHelper commented on a change in pull request #16792:
URL: https://github.com/apache/airflow/pull/16792#discussion_r698559013



##
File path: airflow/providers/sftp/operators/sftp.py
##
@@ -88,6 +124,9 @@ def __init__(
 remote_host=None,
 local_filepath=None,
 remote_filepath=None,
+local_folder=None,
+remote_folder=None,
+regexp_mask=None,

Review comment:
   for this case will print AirflowException with examples
   
https://github.com/apache/airflow/pull/16792/files#diff-911f39c07d851bc2f0a2958674ed940ddc68aea6ecc7d71c29567816c182eaf6R216
   
   I checked the example of dag again, I am comfortable working with one 
operetor, even if he has combined arguments.
   
   Your idea i understand too, may be we will wait @potiuk and hear his opinion




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow - SFTP Operator for multiple files
> --
>
> Key: AIRFLOW-5703
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5703
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 1.10.5
>Reporter: Mattia
>Assignee: Andrey Kucherov
>Priority: Major
>
> *AS* User
> *I WANT TO* download / upload multiple files from sftp server
> *SO THAT*  i need the possibility to add a list of file instead of single one



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files

2021-08-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406629#comment-17406629
 ] 

ASF GitHub Bot commented on AIRFLOW-5703:
-

uranusjr commented on a change in pull request #16792:
URL: https://github.com/apache/airflow/pull/16792#discussion_r698291886



##
File path: airflow/providers/sftp/operators/sftp.py
##
@@ -134,29 +176,81 @@ def execute(self, context: Any) -> str:
 
 with self.ssh_hook.get_conn() as ssh_client:
 sftp_client = ssh_client.open_sftp()
-if self.operation.lower() == SFTPOperation.GET:
-local_folder = os.path.dirname(self.local_filepath)
-if self.create_intermediate_dirs:
-Path(local_folder).mkdir(parents=True, exist_ok=True)
-file_msg = f"from {self.remote_filepath} to 
{self.local_filepath}"
-self.log.info("Starting to transfer %s", file_msg)
-sftp_client.get(self.remote_filepath, self.local_filepath)
-else:
-remote_folder = os.path.dirname(self.remote_filepath)
-if self.create_intermediate_dirs:
-_make_intermediate_dirs(
-sftp_client=sftp_client,
-remote_directory=remote_folder,
+if self.local_filepath and self.remote_filepath:
+if isinstance(self.local_filepath, list) and 
isinstance(self.remote_filepath, str):
+for file_path in self.local_filepath:
+local_folder = os.path.dirname(file_path)
+local_file = os.path.basename(file_path)
+file_msg = file_path
+self._transfer(sftp_client, local_folder, 
local_file, self.remote_filepath)
+elif isinstance(self.remote_filepath, list) and 
isinstance(self.local_filepath, str):
+for file_path in self.remote_filepath:
+remote_folder = os.path.dirname(file_path)
+remote_file = os.path.basename(file_path)
+file_msg = file_path
+self._transfer(sftp_client, self.local_filepath, 
remote_file, remote_folder)
+elif isinstance(self.remote_filepath, str) and 
isinstance(self.local_filepath, str):
+local_folder = os.path.dirname(self.local_filepath)
+file_msg = self.local_filepath
+self._transfer(
+sftp_client,
+local_folder,
+self.local_filepath,
+self.remote_filepath,
+only_file=True,
 )
-file_msg = f"from {self.local_filepath} to 
{self.remote_filepath}"
-self.log.info("Starting to transfer file %s", file_msg)
-sftp_client.put(self.local_filepath, self.remote_filepath, 
confirm=self.confirm)
+elif self.local_folder and self.remote_folder:
+if self.operation.lower() == SFTPOperation.PUT:
+files_list = 
self._search_files(os.listdir(self.local_folder))
+for file in files_list:
+local_file = os.path.basename(file)
+file_msg = file
+self._transfer(sftp_client, self.local_folder, 
local_file, self.remote_folder)
+elif self.operation.lower() == SFTPOperation.GET:
+files_list = 
self._search_files(sftp_client.listdir(self.remote_folder))
+for file in files_list:
+remote_file = ntpath.basename(file)

Review comment:
   I think this should be `os.path.basename` like in the `PUT` block.

##
File path: airflow/providers/sftp/operators/sftp.py
##
@@ -88,6 +124,9 @@ def __init__(
 remote_host=None,
 local_filepath=None,
 remote_filepath=None,
+local_folder=None,
+remote_folder=None,
+regexp_mask=None,

Review comment:
   I am really not fond of this argument list since it’s too easy to pass 
in nonsensical combinations e.g.
   
   ```python
   SFTPOperator(
   local_filepath="/my/file",
   local_folder="/my/dir",
   )
   ```
   
   But I understand this is not easy to change the arguments due to 
compatibility considerations.
   
   Perhapes what we should do is to introduce a separate operator (say 
`SFTPBatchOperator`) to handle the new feature, instead of bolting it onto the 
existing one.




-- 
This is an automated message from the Apache Git Service.
To 

[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files

2021-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406458#comment-17406458
 ] 

ASF GitHub Bot commented on AIRFLOW-5703:
-

AngryHelper edited a comment on pull request #16792:
URL: https://github.com/apache/airflow/pull/16792#issuecomment-907842770


   > Need to fix linter issues. See guide: 
https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#contribution-workflow
   
   @uranusjr @potiuk  sorry for your waiting, I was solving the issue with my 
cloud provider. 
   I pushed fix about linter issues, you may see and pls run workflow, as I 
understand, I don`t have access for this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow - SFTP Operator for multiple files
> --
>
> Key: AIRFLOW-5703
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5703
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 1.10.5
>Reporter: Mattia
>Assignee: Andrey Kucherov
>Priority: Major
>
> *AS* User
> *I WANT TO* download / upload multiple files from sftp server
> *SO THAT*  i need the possibility to add a list of file instead of single one



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files

2021-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406457#comment-17406457
 ] 

ASF GitHub Bot commented on AIRFLOW-5703:
-

AngryHelper commented on pull request #16792:
URL: https://github.com/apache/airflow/pull/16792#issuecomment-907842770


   > Need to fix linter issues. See guide: 
https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#contribution-workflow
   
   sorry for your waiting, I was solving the issue with my cloud provider. 
   I pushed fix about linter issues, you may see and pls run workflow, as I 
understand, I don`t have access for this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow - SFTP Operator for multiple files
> --
>
> Key: AIRFLOW-5703
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5703
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 1.10.5
>Reporter: Mattia
>Assignee: Andrey Kucherov
>Priority: Major
>
> *AS* User
> *I WANT TO* download / upload multiple files from sftp server
> *SO THAT*  i need the possibility to add a list of file instead of single one



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor

2021-08-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405518#comment-17405518
 ] 

ASF GitHub Bot commented on AIRFLOW-6786:
-

github-actions[bot] commented on pull request #12388:
URL: https://github.com/apache/airflow/pull/12388#issuecomment-906824992


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed in 5 days if no further activity occurs. 
Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
> 
>
> Key: AIRFLOW-6786
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6786
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, hooks
>Affects Versions: 1.10.9
>Reporter: Daniel Ferguson
>Assignee: Daniel Ferguson
>Priority: Minor
>
> Add the KafkaProducerHook.
>  Add the KafkaConsumerHook.
>  Add the KafkaSensor which listens to messages with a specific topic.
>  Related Issue:
>  #1311 (Pre-dates Jira Migration)
> Reminder to contributors:
> You must add an Apache License header to all new files
>  Please squash your commits when possible and follow the 7 rules of good Git 
> commits
>  I am new to the community, I am not sure the files are at the right place or 
> missing anything.
> The sensor could be used as the first node of a dag where the second node can 
> be a TriggerDagRunOperator. The messages are polled in a batch and the dag 
> runs are dynamically generated.
> Thanks!
> Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], 
> it is important to mention these integrations are not suitable for 
> low-latency/high-throughput/streaming. For reference, [#1415 
> (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806].
> Co-authored-by: Dan Ferguson 
> [dferguson...@gmail.com|mailto:dferguson...@gmail.com]
>  Co-authored-by: YuanfΞi Zhu



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2021-08-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402418#comment-17402418
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

raviporan commented on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-902954329


   Hey Guys, Currently we are on the Airflow 1.14 version; We were getting a 
similar issue with our tasks going under up_for_retry state for hours. I went 
thru this thread & comments|inputs from various users on tweaking the 
poke_interval values; Our original poke_interval was set to 60 and changing the 
value to ~93 seconds resolved the issue with tasks getting into 
**_up_for_retry_** state ; This worked like a charm, but wanted to get more 
details on the race condition that scheduler is getting into when the 
poke_interval values are <= 60. Appreciate your help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-08-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401244#comment-17401244
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

knutole edited a comment on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-901041426






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-08-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401238#comment-17401238
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

potiuk edited a comment on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-901137418


   It's because you likely have custom logging configuration which does not 
derive from Airlfow's one as explained in 
https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#schedule-after-task-execution
 
   
   Make sure you start with:
   
   ```
   LOGGING_CONFIG = deepcopy(DEFAULT_LOGGING_CONFIG)
   ```
   
   and modify that config if you want to add custom configuration. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-08-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401229#comment-17401229
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

knutole commented on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-901041426






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-08-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401215#comment-17401215
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

potiuk commented on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-901137418






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-08-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401117#comment-17401117
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

knutole commented on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-901180260


   Thanks @potiuk, that is exactly what happened. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-08-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401076#comment-17401076
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

potiuk commented on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-901138236


   You likely copied "verbatim" old configuration from Airlfow and modified it, 
so the new configuration from DEFAULT_LOGGING_CONFIG is not used.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-08-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401075#comment-17401075
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

potiuk edited a comment on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-901137418


   It's because you likely have custom logging configuration which does not 
derive from Airlfow's one as explained in 
https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#schedule-after-task-execution
 
   
   Make sure you start with:
   
   ```
   LOGGING_CONFIG = deepcopy(DEFAULT_LOGGING_CONFIG)
   ```
   
   and modify that config if you want to add custom configuration. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-08-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401074#comment-17401074
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

potiuk commented on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-901137418


   It's because you likely have custom logging configuration which does not 
derive from Airlfow's one as explained in 
https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#schedule-after-task-execution
 
   
   Make sure you start with:
   
   ```
   LOGGING_CONFIG = deepcopy(DEFAULT_LOGGING_CONFIG)
   ```
   
   and only modify the config if you want to add custom configuration. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-08-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400985#comment-17400985
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

knutole edited a comment on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-901041426


   We are also getting `No SecretsMasker found!` on Airflow 2.1.2... 
   
   Could this be due to breaking changes in the configuration file? 
   
   We have tried setting `hide_sensitive_var_conn_fields = False` to no avail. 
   
   ```bash
   [2021-08-18 11:05:53,690] {celery_executor.py:120} ERROR - Failed to execute 
task No SecretsMasker found!.
   Traceback (most recent call last):
 File 
"/usr/local/lib/python3.6/dist-packages/airflow/executors/celery_executor.py", 
line 117, in _execute_in_fork
   args.func(args)
 File "/usr/local/lib/python3.6/dist-packages/airflow/cli/cli_parser.py", 
line 48, in command
   return func(*args, **kwargs)
 File "/usr/local/lib/python3.6/dist-packages/airflow/utils/cli.py", line 
91, in wrapper
   return f(*args, **kwargs)
 File 
"/usr/local/lib/python3.6/dist-packages/airflow/cli/commands/task_command.py", 
line 212, in task_run
   settings.configure_orm(disable_connection_pool=True)
 File "/usr/local/lib/python3.6/dist-packages/airflow/settings.py", line 
224, in configure_orm
   mask_secret(engine.url.password)
 File 
"/usr/local/lib/python3.6/dist-packages/airflow/utils/log/secrets_masker.py", 
line 91, in mask_secret
   _secrets_masker().add_mask(secret, name)
 File 
"/usr/local/lib/python3.6/dist-packages/airflow/utils/log/secrets_masker.py", 
line 105, in _secrets_masker
   raise RuntimeError("No SecretsMasker found!")
   RuntimeError: No SecretsMasker found!
   [2021-08-18 11:05:53,710: ERROR/ForkPoolWorker-3] Task 
airflow.executors.celery_executor.execute_command[f6a9b0cd-bb0c-414a-a51c-80579f2d2f1e]
 raised unexpected: AirflowException('Celery command failed on host: 
64c3bc97f173',)
   Traceback (most recent call last):
 File "/usr/local/lib/python3.6/dist-packages/celery/app/trace.py", line 
412, in trace_task
   R = retval = fun(*args, **kwargs)
 File "/usr/local/lib/python3.6/dist-packages/celery/app/trace.py", line 
704, in __protected_call__
   return self.run(*args, **kwargs)
 File 
"/usr/local/lib/python3.6/dist-packages/airflow/executors/celery_executor.py", 
line 88, in execute_command
   _execute_in_fork(command_to_exec)
 File 
"/usr/local/lib/python3.6/dist-packages/airflow/executors/celery_executor.py", 
line 99, in _execute_in_fork
   raise AirflowException('Celery command failed on host: ' + 
get_hostname())
   airflow.exceptions.AirflowException: Celery command failed on host: 
64c3bc97f173
   ```
   
   The erroring line is 105 in `_secrets_masker.py`:
   ```python
   @cache
   def _secrets_masker() -> "SecretsMasker":
   
   for flt in logging.getLogger('airflow.task').filters:
   if isinstance(flt, SecretsMasker):
   return flt
   raise RuntimeError("No SecretsMasker found!")  
   ```
   
   This is our logging setup in our DAG:
   ```python
   log = logging.getLogger()
   log.setLevel(logging.DEBUG)
   stream_handler = logging.StreamHandler()
   log.addHandler(stream_handler)
   ```
   
   How to fix this? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira

[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-08-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400984#comment-17400984
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

knutole edited a comment on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-901041426


   We are also getting `No SecretsMasker found!` on Airflow 2.1.2... 
   
   Could this be due to breaking changes in the configuration file? 
   
   We have tried setting `hide_sensitive_var_conn_fields = False` to no avail. 
   
   ```bash
   [2021-08-18 11:05:53,690] {celery_executor.py:120} ERROR - Failed to execute 
task No SecretsMasker found!.
   Traceback (most recent call last):
 File 
"/usr/local/lib/python3.6/dist-packages/airflow/executors/celery_executor.py", 
line 117, in _execute_in_fork
   args.func(args)
 File "/usr/local/lib/python3.6/dist-packages/airflow/cli/cli_parser.py", 
line 48, in command
   return func(*args, **kwargs)
 File "/usr/local/lib/python3.6/dist-packages/airflow/utils/cli.py", line 
91, in wrapper
   return f(*args, **kwargs)
 File 
"/usr/local/lib/python3.6/dist-packages/airflow/cli/commands/task_command.py", 
line 212, in task_run
   settings.configure_orm(disable_connection_pool=True)
 File "/usr/local/lib/python3.6/dist-packages/airflow/settings.py", line 
224, in configure_orm
   mask_secret(engine.url.password)
 File 
"/usr/local/lib/python3.6/dist-packages/airflow/utils/log/secrets_masker.py", 
line 91, in mask_secret
   _secrets_masker().add_mask(secret, name)
 File 
"/usr/local/lib/python3.6/dist-packages/airflow/utils/log/secrets_masker.py", 
line 105, in _secrets_masker
   raise RuntimeError("No SecretsMasker found!")
   RuntimeError: No SecretsMasker found!
   [2021-08-18 11:05:53,710: ERROR/ForkPoolWorker-3] Task 
airflow.executors.celery_executor.execute_command[f6a9b0cd-bb0c-414a-a51c-80579f2d2f1e]
 raised unexpected: AirflowException('Celery command failed on host: 
64c3bc97f173',)
   Traceback (most recent call last):
 File "/usr/local/lib/python3.6/dist-packages/celery/app/trace.py", line 
412, in trace_task
   R = retval = fun(*args, **kwargs)
 File "/usr/local/lib/python3.6/dist-packages/celery/app/trace.py", line 
704, in __protected_call__
   return self.run(*args, **kwargs)
 File 
"/usr/local/lib/python3.6/dist-packages/airflow/executors/celery_executor.py", 
line 88, in execute_command
   _execute_in_fork(command_to_exec)
 File 
"/usr/local/lib/python3.6/dist-packages/airflow/executors/celery_executor.py", 
line 99, in _execute_in_fork
   raise AirflowException('Celery command failed on host: ' + 
get_hostname())
   airflow.exceptions.AirflowException: Celery command failed on host: 
64c3bc97f173
   ```
   
   The erroring line is 105 in `_secrets_masker.py`:
   ```python
   @cache
   def _secrets_masker() -> "SecretsMasker":
   
   for flt in logging.getLogger('airflow.task').filters:
   if isinstance(flt, SecretsMasker):
   return flt
   raise RuntimeError("No SecretsMasker found!")  
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-08-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400982#comment-17400982
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

knutole commented on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-901041426


   We are also getting `No SecretsMasker found!` on Airflow 2.1.2... 
   
   Could this be due to breaking changes in the configuration file? 
   
   We have tried setting `hide_sensitive_var_conn_fields = False` to no avail. 
   
   ``` 
   [2021-08-18 11:05:53,690] {celery_executor.py:120} ERROR - Failed to execute 
task No SecretsMasker found!.
   Traceback (most recent call last):
 File 
"/usr/local/lib/python3.6/dist-packages/airflow/executors/celery_executor.py", 
line 117, in _execute_in_fork
   args.func(args)
 File "/usr/local/lib/python3.6/dist-packages/airflow/cli/cli_parser.py", 
line 48, in command
   return func(*args, **kwargs)
 File "/usr/local/lib/python3.6/dist-packages/airflow/utils/cli.py", line 
91, in wrapper
   return f(*args, **kwargs)
 File 
"/usr/local/lib/python3.6/dist-packages/airflow/cli/commands/task_command.py", 
line 212, in task_run
   settings.configure_orm(disable_connection_pool=True)
 File "/usr/local/lib/python3.6/dist-packages/airflow/settings.py", line 
224, in configure_orm
   mask_secret(engine.url.password)
 File 
"/usr/local/lib/python3.6/dist-packages/airflow/utils/log/secrets_masker.py", 
line 91, in mask_secret
   _secrets_masker().add_mask(secret, name)
 File 
"/usr/local/lib/python3.6/dist-packages/airflow/utils/log/secrets_masker.py", 
line 105, in _secrets_masker
   raise RuntimeError("No SecretsMasker found!")
   RuntimeError: No SecretsMasker found!
   [2021-08-18 11:05:53,710: ERROR/ForkPoolWorker-3] Task 
airflow.executors.celery_executor.execute_command[f6a9b0cd-bb0c-414a-a51c-80579f2d2f1e]
 raised unexpected: AirflowException('Celery command failed on host: 
64c3bc97f173',)
   Traceback (most recent call last):
 File "/usr/local/lib/python3.6/dist-packages/celery/app/trace.py", line 
412, in trace_task
   R = retval = fun(*args, **kwargs)
 File "/usr/local/lib/python3.6/dist-packages/celery/app/trace.py", line 
704, in __protected_call__
   return self.run(*args, **kwargs)
 File 
"/usr/local/lib/python3.6/dist-packages/airflow/executors/celery_executor.py", 
line 88, in execute_command
   _execute_in_fork(command_to_exec)
 File 
"/usr/local/lib/python3.6/dist-packages/airflow/executors/celery_executor.py", 
line 99, in _execute_in_fork
   raise AirflowException('Celery command failed on host: ' + 
get_hostname())
   airflow.exceptions.AirflowException: Celery command failed on host: 
64c3bc97f173
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor

2021-08-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17399941#comment-17399941
 ] 

ASF GitHub Bot commented on AIRFLOW-6786:
-

knackjason commented on pull request #7407:
URL: https://github.com/apache/airflow/pull/7407#issuecomment-899735866


   > But there is this PR in progress: #12388
   
   Ah, I didn't realize there was a newer PR. Thanks, @potiuk!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
> 
>
> Key: AIRFLOW-6786
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6786
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, hooks
>Affects Versions: 1.10.9
>Reporter: Daniel Ferguson
>Assignee: Daniel Ferguson
>Priority: Minor
>
> Add the KafkaProducerHook.
>  Add the KafkaConsumerHook.
>  Add the KafkaSensor which listens to messages with a specific topic.
>  Related Issue:
>  #1311 (Pre-dates Jira Migration)
> Reminder to contributors:
> You must add an Apache License header to all new files
>  Please squash your commits when possible and follow the 7 rules of good Git 
> commits
>  I am new to the community, I am not sure the files are at the right place or 
> missing anything.
> The sensor could be used as the first node of a dag where the second node can 
> be a TriggerDagRunOperator. The messages are polled in a batch and the dag 
> runs are dynamically generated.
> Thanks!
> Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], 
> it is important to mention these integrations are not suitable for 
> low-latency/high-throughput/streaming. For reference, [#1415 
> (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806].
> Co-authored-by: Dan Ferguson 
> [dferguson...@gmail.com|mailto:dferguson...@gmail.com]
>  Co-authored-by: YuanfΞi Zhu



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor

2021-08-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17399939#comment-17399939
 ] 

ASF GitHub Bot commented on AIRFLOW-6786:
-

potiuk commented on pull request #7407:
URL: https://github.com/apache/airflow/pull/7407#issuecomment-899734636


   But there is this PR in progress: 
https://github.com/apache/airflow/pull/12388


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
> 
>
> Key: AIRFLOW-6786
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6786
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, hooks
>Affects Versions: 1.10.9
>Reporter: Daniel Ferguson
>Assignee: Daniel Ferguson
>Priority: Minor
>
> Add the KafkaProducerHook.
>  Add the KafkaConsumerHook.
>  Add the KafkaSensor which listens to messages with a specific topic.
>  Related Issue:
>  #1311 (Pre-dates Jira Migration)
> Reminder to contributors:
> You must add an Apache License header to all new files
>  Please squash your commits when possible and follow the 7 rules of good Git 
> commits
>  I am new to the community, I am not sure the files are at the right place or 
> missing anything.
> The sensor could be used as the first node of a dag where the second node can 
> be a TriggerDagRunOperator. The messages are polled in a batch and the dag 
> runs are dynamically generated.
> Thanks!
> Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], 
> it is important to mention these integrations are not suitable for 
> low-latency/high-throughput/streaming. For reference, [#1415 
> (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806].
> Co-authored-by: Dan Ferguson 
> [dferguson...@gmail.com|mailto:dferguson...@gmail.com]
>  Co-authored-by: YuanfΞi Zhu



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor

2021-08-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17399937#comment-17399937
 ] 

ASF GitHub Bot commented on AIRFLOW-6786:
-

potiuk commented on pull request #7407:
URL: https://github.com/apache/airflow/pull/7407#issuecomment-899733940


   Does not look like


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
> 
>
> Key: AIRFLOW-6786
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6786
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, hooks
>Affects Versions: 1.10.9
>Reporter: Daniel Ferguson
>Assignee: Daniel Ferguson
>Priority: Minor
>
> Add the KafkaProducerHook.
>  Add the KafkaConsumerHook.
>  Add the KafkaSensor which listens to messages with a specific topic.
>  Related Issue:
>  #1311 (Pre-dates Jira Migration)
> Reminder to contributors:
> You must add an Apache License header to all new files
>  Please squash your commits when possible and follow the 7 rules of good Git 
> commits
>  I am new to the community, I am not sure the files are at the right place or 
> missing anything.
> The sensor could be used as the first node of a dag where the second node can 
> be a TriggerDagRunOperator. The messages are polled in a batch and the dag 
> runs are dynamically generated.
> Thanks!
> Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], 
> it is important to mention these integrations are not suitable for 
> low-latency/high-throughput/streaming. For reference, [#1415 
> (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806].
> Co-authored-by: Dan Ferguson 
> [dferguson...@gmail.com|mailto:dferguson...@gmail.com]
>  Co-authored-by: YuanfΞi Zhu



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6786) Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor

2021-08-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17399922#comment-17399922
 ] 

ASF GitHub Bot commented on AIRFLOW-6786:
-

knackjason commented on pull request #7407:
URL: https://github.com/apache/airflow/pull/7407#issuecomment-899724393


   Did this ever get merged in? I'd be interested in having officially 
supported Kafka hooks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Adding KafkaConsumerHook, KafkaProducerHook, and KafkaSensor
> 
>
> Key: AIRFLOW-6786
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6786
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, hooks
>Affects Versions: 1.10.9
>Reporter: Daniel Ferguson
>Assignee: Daniel Ferguson
>Priority: Minor
>
> Add the KafkaProducerHook.
>  Add the KafkaConsumerHook.
>  Add the KafkaSensor which listens to messages with a specific topic.
>  Related Issue:
>  #1311 (Pre-dates Jira Migration)
> Reminder to contributors:
> You must add an Apache License header to all new files
>  Please squash your commits when possible and follow the 7 rules of good Git 
> commits
>  I am new to the community, I am not sure the files are at the right place or 
> missing anything.
> The sensor could be used as the first node of a dag where the second node can 
> be a TriggerDagRunOperator. The messages are polled in a batch and the dag 
> runs are dynamically generated.
> Thanks!
> Note, as per denied PR [#1415|https://github.com/apache/airflow/pull/1415], 
> it is important to mention these integrations are not suitable for 
> low-latency/high-throughput/streaming. For reference, [#1415 
> (comment)|https://github.com/apache/airflow/pull/1415#issuecomment-484429806].
> Co-authored-by: Dan Ferguson 
> [dferguson...@gmail.com|mailto:dferguson...@gmail.com]
>  Co-authored-by: YuanfΞi Zhu



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-08-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394452#comment-17394452
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

gowdra01 commented on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-893961036


   I am stuck with this too I am using 2.1.0 version and getting the error
   *** Failed to fetch log file from worker. 503 Server Error: Service 
Unavailable for url:


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-08-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394442#comment-17394442
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

xuemengran commented on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-893956847


   In the 2.1.1 version, I tried to modify the 
airflow/utils/log/file_task_handler.py file to obtain the hostname information 
by reading the log table. 
   I confirmed through debug that I could get the host information in this way, 
but a bigger problem appeared. 
   The task is marked as successful without scheduling, and the log is still 
not viewable, so I confirm that to solve this problem, the host information 
must be written to the task_instance table before the task is executed. 
   I think this bug is very Important, because it directly affects the use of 
airflow in distributed scenarios, please solve it as soon as possible!!!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-08-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393918#comment-17393918
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

xuemengran commented on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-893100953


   Hi,
   I am still seeing the issue in 2.1.1 version, my executor is celery


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-08-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393541#comment-17393541
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

xuemengran commented on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-893100953


   Hi,
   I am still seeing the issue in 2.1.1 version, my executor is celery


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5703) Airflow - SFTP Operator for multiple files

2021-08-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391415#comment-17391415
 ] 

ASF GitHub Bot commented on AIRFLOW-5703:
-

uranusjr commented on pull request #16792:
URL: https://github.com/apache/airflow/pull/16792#issuecomment-890806062


   Need to fix linter issues. See guide: 
https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#contribution-workflow


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow - SFTP Operator for multiple files
> --
>
> Key: AIRFLOW-5703
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5703
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 1.10.5
>Reporter: Mattia
>Assignee: Andrey Kucherov
>Priority: Major
>
> *AS* User
> *I WANT TO* download / upload multiple files from sftp server
> *SO THAT*  i need the possibility to add a list of file instead of single one



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-07-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388861#comment-17388861
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

YonatanKiron commented on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-888423682


   Same for us, exactly the same error as @vdusek posted


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >