[GitHub] dlackty commented on a change in pull request #2306: [AIRFLOW-1214] Fix conversion of int type in BigQueryHook

2018-12-10 Thread GitBox
dlackty commented on a change in pull request #2306: [AIRFLOW-1214] Fix 
conversion of int type in BigQueryHook
URL: https://github.com/apache/incubator-airflow/pull/2306#discussion_r240502685
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -915,7 +915,7 @@ def _bq_cast(string_field, bq_type):
 if string_field is None:
 return None
 elif bq_type == 'INTEGER' or bq_type == 'TIMESTAMP':
-return int(string_field)
+return int(float(string_field))
 
 Review comment:
   @Fokko & @tswast for some reason I didn't see your comments. I'll open 
another PR for this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ron819 commented on issue #2455: [AIRFLOW-1423] Add logs to the scheduler DAG run decision logic

2018-12-10 Thread GitBox
ron819 commented on issue #2455: [AIRFLOW-1423] Add logs to the scheduler DAG 
run decision logic
URL: 
https://github.com/apache/incubator-airflow/pull/2455#issuecomment-446100420
 
 
   @ultrabug was there something else needs to be done in this PR ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ron819 commented on issue #3687: [AIRFLOW-2805] Display multiple timezones in the tooltip on TaskInstances

2018-12-10 Thread GitBox
ron819 commented on issue #3687: [AIRFLOW-2805] Display multiple timezones in 
the tooltip on TaskInstances
URL: 
https://github.com/apache/incubator-airflow/pull/3687#issuecomment-446098744
 
 
   Can this be cherry picked to 1.10.2? Also the Jira ticket is still open .. 
needs to be closed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] wyndhblb commented on a change in pull request #4163: [AIRFLOW-3319] - KubernetsExecutor: Need in try_number in labels if getting them later

2018-12-10 Thread GitBox
wyndhblb commented on a change in pull request #4163: [AIRFLOW-3319] - 
KubernetsExecutor: Need in try_number in  labels if getting them later
URL: https://github.com/apache/incubator-airflow/pull/4163#discussion_r240486765
 
 

 ##
 File path: airflow/contrib/executors/kubernetes_executor.py
 ##
 @@ -458,10 +459,16 @@ def _datetime_to_label_safe_datestring(datetime_obj):
 
 def _labels_to_key(self, labels):
 try:
+try_num = 1
+try:
+try_num = int(labels.get('try_number', '1'))
+except ValueError:
+self.log.warn("could not get try_number as an int: %s", 
labels.get('try_number', '1'))
 return (
 labels['dag_id'], labels['task_id'],
 
self._label_safe_datestring_to_datetime(labels['execution_date']),
-labels['try_number'])
+try_num
 
 Review comment:
   Yes, older pods in a live system will not have this try number set, and 
needs to be defaulted to something reasonable.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] wyndhblb commented on a change in pull request #4163: [AIRFLOW-3319] - KubernetsExecutor: Need in try_number in labels if getting them later

2018-12-10 Thread GitBox
wyndhblb commented on a change in pull request #4163: [AIRFLOW-3319] - 
KubernetsExecutor: Need in try_number in  labels if getting them later
URL: https://github.com/apache/incubator-airflow/pull/4163#discussion_r240485547
 
 

 ##
 File path: airflow/contrib/executors/kubernetes_executor.py
 ##
 @@ -346,6 +346,7 @@ def run_next(self, next_job):
 namespace=self.namespace, worker_uuid=self.worker_uuid,
 
 Review comment:
   yes, the only call to this worker_config object function


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2701) Clean up dangling backfill dagrun

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716084#comment-16716084
 ] 

ASF GitHub Bot commented on AIRFLOW-2701:
-

feng-tao closed pull request #3562: [WIP][AIRFLOW-2701] Clean up backfill 
dangling dagrun
URL: https://github.com/apache/incubator-airflow/pull/3562
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/jobs.py b/airflow/jobs.py
index 00ede5451d..8f19a67231 100644
--- a/airflow/jobs.py
+++ b/airflow/jobs.py
@@ -2008,6 +2008,7 @@ def __init__(
 self.verbose = verbose
 self.conf = conf
 self.rerun_failed_tasks = rerun_failed_tasks
+self.dag_runs = []
 super(BackfillJob, self).__init__(*args, **kwargs)
 
 def _update_counters(self, ti_status):
@@ -2469,6 +2470,7 @@ def _execute_for_run_dates(self, run_dates, ti_status, 
executor, pickle_id,
session=session)
 if dag_run is None:
 continue
+self.dag_runs.append(dag_run)
 
 ti_status.active_runs.append(dag_run)
 ti_status.to_run.update(tis_map or {})
@@ -2540,9 +2542,32 @@ def _execute(self, session=None):
 self.dag_id
 )
 time.sleep(self.delay_on_limit_secs)
+
+except (KeyboardInterrupt, SystemExit):
+for dag_run in self.dag_runs:
+dag_run.refresh_from_db(session)
+make_transient(dag_run)
+
+# Check all the tasks which are not in success state.
+check_state = State.unfinished() + State.finished()
+check_state.remove(State.SUCCESS)
+
+unfinished_tasks = dag_run.get_task_instances(
+state=check_state,
+session=session
+)
+if unfinished_tasks:
+# if there are unfinished tasks and ctrl^c
+# set dag run state to failed
+for task in unfinished_tasks:
+task.set_state(State.FAILED, session)
+dag_run.state = State.FAILED
+else:
+dag_run.state = State.SUCCESS
+session.merge(dag_run)
 finally:
-executor.end()
 session.commit()
+executor.end()
 
 self.log.info("Backfill done. Exiting.")
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Clean up dangling backfill dagrun
> -
>
> Key: AIRFLOW-2701
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2701
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Tao Feng
>Assignee: Tao Feng
>Priority: Major
>
> When user tries to backfill and hit ctrol+9, the backfill dagrun will stay as 
> running state. We should set it to failed if it has unfinished tasks.
>  
> In our production, we see lots of these dangling backfill dagrun which will 
> cause as one active dagrun in the next backfill. This may prevent user from 
> backfilling if the max_active_run is reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] feng-tao closed pull request #3562: [WIP][AIRFLOW-2701] Clean up backfill dangling dagrun

2018-12-10 Thread GitBox
feng-tao closed pull request #3562: [WIP][AIRFLOW-2701] Clean up backfill 
dangling dagrun
URL: https://github.com/apache/incubator-airflow/pull/3562
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/jobs.py b/airflow/jobs.py
index 00ede5451d..8f19a67231 100644
--- a/airflow/jobs.py
+++ b/airflow/jobs.py
@@ -2008,6 +2008,7 @@ def __init__(
 self.verbose = verbose
 self.conf = conf
 self.rerun_failed_tasks = rerun_failed_tasks
+self.dag_runs = []
 super(BackfillJob, self).__init__(*args, **kwargs)
 
 def _update_counters(self, ti_status):
@@ -2469,6 +2470,7 @@ def _execute_for_run_dates(self, run_dates, ti_status, 
executor, pickle_id,
session=session)
 if dag_run is None:
 continue
+self.dag_runs.append(dag_run)
 
 ti_status.active_runs.append(dag_run)
 ti_status.to_run.update(tis_map or {})
@@ -2540,9 +2542,32 @@ def _execute(self, session=None):
 self.dag_id
 )
 time.sleep(self.delay_on_limit_secs)
+
+except (KeyboardInterrupt, SystemExit):
+for dag_run in self.dag_runs:
+dag_run.refresh_from_db(session)
+make_transient(dag_run)
+
+# Check all the tasks which are not in success state.
+check_state = State.unfinished() + State.finished()
+check_state.remove(State.SUCCESS)
+
+unfinished_tasks = dag_run.get_task_instances(
+state=check_state,
+session=session
+)
+if unfinished_tasks:
+# if there are unfinished tasks and ctrl^c
+# set dag run state to failed
+for task in unfinished_tasks:
+task.set_state(State.FAILED, session)
+dag_run.state = State.FAILED
+else:
+dag_run.state = State.SUCCESS
+session.merge(dag_run)
 finally:
-executor.end()
 session.commit()
+executor.end()
 
 self.log.info("Backfill done. Exiting.")
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jongyoul commented on issue #4057: [AIRFLOW-3216] HiveServer2Hook need a password with LDAP authentication

2018-12-10 Thread GitBox
jongyoul commented on issue #4057: [AIRFLOW-3216] HiveServer2Hook need a 
password with LDAP authentication
URL: 
https://github.com/apache/incubator-airflow/pull/4057#issuecomment-446054890
 
 
   @Fokko @bolkedebruin Sorry for the late reply. I've added a test case. BTW, 
the failed one doesn't look related to my changes. Could you please confirm it?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2015: [AIRFLOW-765] Auto detect dag dependency files, variables, and resour…

2018-12-10 Thread GitBox
stale[bot] commented on issue #2015: [AIRFLOW-765] Auto detect dag dependency 
files, variables, and resour…
URL: 
https://github.com/apache/incubator-airflow/pull/2015#issuecomment-446048852
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2026: [AIRFLOW-811] [BugFix] bash_operator does not return full output

2018-12-10 Thread GitBox
stale[bot] commented on issue #2026: [AIRFLOW-811] [BugFix] bash_operator does 
not return full output
URL: 
https://github.com/apache/incubator-airflow/pull/2026#issuecomment-446048844
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #1910: [AIRFLOW-659] Automatic Refresh on DAG Graph View

2018-12-10 Thread GitBox
stale[bot] commented on issue #1910: [AIRFLOW-659] Automatic Refresh on DAG 
Graph View
URL: 
https://github.com/apache/incubator-airflow/pull/1910#issuecomment-446048869
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #1922: [AIRFLOW-675] Add error log to UI

2018-12-10 Thread GitBox
stale[bot] commented on issue #1922: [AIRFLOW-675] Add error log to UI
URL: 
https://github.com/apache/incubator-airflow/pull/1922#issuecomment-446048862
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2014: [AIRFLOW-796] Config options for num_runs, processor_poll_interval, log_level

2018-12-10 Thread GitBox
stale[bot] commented on issue #2014: [AIRFLOW-796] Config options for num_runs, 
processor_poll_interval, log_level
URL: 
https://github.com/apache/incubator-airflow/pull/2014#issuecomment-446048848
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2052: [AIRFLOW-833] Add an `airflow default_config` sub-command

2018-12-10 Thread GitBox
stale[bot] commented on issue #2052: [AIRFLOW-833] Add an `airflow 
default_config` sub-command
URL: 
https://github.com/apache/incubator-airflow/pull/2052#issuecomment-446048841
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #1942: [AIRFLOW-697] Add exclusion of tasks.

2018-12-10 Thread GitBox
stale[bot] commented on issue #1942: [AIRFLOW-697] Add exclusion of tasks.
URL: 
https://github.com/apache/incubator-airflow/pull/1942#issuecomment-446048856
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #748: End-to-end DAG testing

2018-12-10 Thread GitBox
stale[bot] commented on issue #748: End-to-end DAG testing
URL: https://github.com/apache/incubator-airflow/pull/748#issuecomment-446048866
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2223: [AIRFLOW-1076] Add get method for template variable accessor

2018-12-10 Thread GitBox
stale[bot] commented on issue #2223: [AIRFLOW-1076] Add get method for template 
variable accessor
URL: 
https://github.com/apache/incubator-airflow/pull/2223#issuecomment-446019133
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2209: [AIRFLOW-766] Skip conn.commit() when in Auto-commit

2018-12-10 Thread GitBox
stale[bot] commented on issue #2209: [AIRFLOW-766] Skip conn.commit() when in 
Auto-commit
URL: 
https://github.com/apache/incubator-airflow/pull/2209#issuecomment-446019128
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2113: [AIRFLOW-920] Allow marking tasks in zoomed in subdags

2018-12-10 Thread GitBox
stale[bot] commented on issue #2113: [AIRFLOW-920] Allow marking tasks in 
zoomed in subdags
URL: 
https://github.com/apache/incubator-airflow/pull/2113#issuecomment-446019161
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2483: [AIRFLOW-1463] Clear state of queued task

2018-12-10 Thread GitBox
stale[bot] commented on issue #2483: [AIRFLOW-1463] Clear state of queued task
URL: 
https://github.com/apache/incubator-airflow/pull/2483#issuecomment-446019075
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2559: [AIRFLOW-1558] Py3 fix for S3FileTransformOperator

2018-12-10 Thread GitBox
stale[bot] commented on issue #2559: [AIRFLOW-1558] Py3 fix for 
S3FileTransformOperator
URL: 
https://github.com/apache/incubator-airflow/pull/2559#issuecomment-446019058
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2306: [AIRFLOW-1214] Fix conversion of int type in BigQueryHook

2018-12-10 Thread GitBox
stale[bot] commented on issue #2306: [AIRFLOW-1214] Fix conversion of int type 
in BigQueryHook
URL: 
https://github.com/apache/incubator-airflow/pull/2306#issuecomment-446019097
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2513: [AIRFLOW-XXX] Allow setting of keepalives_idle to enable PostgresHook to work with Amazon Redshift

2018-12-10 Thread GitBox
stale[bot] commented on issue #2513: [AIRFLOW-XXX] Allow setting of 
keepalives_idle to enable PostgresHook to work with Amazon Redshift
URL: 
https://github.com/apache/incubator-airflow/pull/2513#issuecomment-446019071
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2595: [AIRFLOW-1592] Add --keep-alive option for gunicorn to airflow config

2018-12-10 Thread GitBox
stale[bot] commented on issue #2595: [AIRFLOW-1592] Add --keep-alive option for 
gunicorn to airflow config
URL: 
https://github.com/apache/incubator-airflow/pull/2595#issuecomment-446019061
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2456: [AIRFLOW-1310] Basic operator to run docker container on Kubernetes

2018-12-10 Thread GitBox
stale[bot] commented on issue #2456: [AIRFLOW-1310] Basic operator to run 
docker container on Kubernetes
URL: 
https://github.com/apache/incubator-airflow/pull/2456#issuecomment-446019083
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2420: [AIRFLOW-1383] Add no_dag_reset option to airflow clear command

2018-12-10 Thread GitBox
stale[bot] commented on issue #2420: [AIRFLOW-1383] Add no_dag_reset option to 
airflow clear command
URL: 
https://github.com/apache/incubator-airflow/pull/2420#issuecomment-446019092
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2598: [AIRFLOW-1595] Change to construct sqlite_hook from connection schema

2018-12-10 Thread GitBox
stale[bot] commented on issue #2598: [AIRFLOW-1595] Change to construct 
sqlite_hook from connection schema
URL: 
https://github.com/apache/incubator-airflow/pull/2598#issuecomment-446019060
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2233: [AIRFLOW-1098] Fix issue in setting parent_dag when loading dags

2018-12-10 Thread GitBox
stale[bot] commented on issue #2233: [AIRFLOW-1098] Fix issue in setting 
parent_dag when loading dags
URL: 
https://github.com/apache/incubator-airflow/pull/2233#issuecomment-446019106
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2228: [AIRFLOW-351] Ensure python_callable is pickleable

2018-12-10 Thread GitBox
stale[bot] commented on issue #2228: [AIRFLOW-351] Ensure python_callable is 
pickleable
URL: 
https://github.com/apache/incubator-airflow/pull/2228#issuecomment-446019101
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2235: [AIRFLOW-1096] Add conn_ids to template_fields

2018-12-10 Thread GitBox
stale[bot] commented on issue #2235: [AIRFLOW-1096] Add conn_ids to 
template_fields
URL: 
https://github.com/apache/incubator-airflow/pull/2235#issuecomment-446019129
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2131: [AIRFLOW-946] call commands with virtualenv if available

2018-12-10 Thread GitBox
stale[bot] commented on issue #2131: [AIRFLOW-946] call commands with 
virtualenv if available
URL: 
https://github.com/apache/incubator-airflow/pull/2131#issuecomment-446019162
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2543: [AIRFLOW-351] fix bug with stop operator

2018-12-10 Thread GitBox
stale[bot] commented on issue #2543: [AIRFLOW-351] fix bug with stop operator
URL: 
https://github.com/apache/incubator-airflow/pull/2543#issuecomment-446019059
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2500: [AIRFLOW-1488] Add the DagRunSensor operator.

2018-12-10 Thread GitBox
stale[bot] commented on issue #2500: [AIRFLOW-1488] Add the DagRunSensor 
operator.
URL: 
https://github.com/apache/incubator-airflow/pull/2500#issuecomment-446019073
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2657: [AIRFLOW-161] New redirect route and extra links

2018-12-10 Thread GitBox
stale[bot] commented on issue #2657: [AIRFLOW-161] New redirect route and extra 
links
URL: 
https://github.com/apache/incubator-airflow/pull/2657#issuecomment-446019038
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2515: [AIRFLOW-1325][WIP] Airflow streaming log backed by ElasticSearch

2018-12-10 Thread GitBox
stale[bot] commented on issue #2515: [AIRFLOW-1325][WIP] Airflow streaming log 
backed by ElasticSearch
URL: 
https://github.com/apache/incubator-airflow/pull/2515#issuecomment-446019063
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2361: [AIRFLOW 1159] DockerOperator can use newer docker.

2018-12-10 Thread GitBox
stale[bot] commented on issue #2361: [AIRFLOW 1159] DockerOperator can use 
newer docker.
URL: 
https://github.com/apache/incubator-airflow/pull/2361#issuecomment-446019096
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2776: [AIRFLOW-1793] fix DockerOperator using docker_conn_id

2018-12-10 Thread GitBox
stale[bot] commented on issue #2776: [AIRFLOW-1793] fix DockerOperator using 
docker_conn_id
URL: 
https://github.com/apache/incubator-airflow/pull/2776#issuecomment-446019016
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2444: [ARIFLOW-1414] Add support for retriggering dependent workflows

2018-12-10 Thread GitBox
stale[bot] commented on issue #2444: [ARIFLOW-1414] Add support for 
retriggering dependent workflows
URL: 
https://github.com/apache/incubator-airflow/pull/2444#issuecomment-446019088
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2503: AIRFLOW-1491 Check celery queue before scheduling commands

2018-12-10 Thread GitBox
stale[bot] commented on issue #2503: AIRFLOW-1491 Check celery queue before 
scheduling commands
URL: 
https://github.com/apache/incubator-airflow/pull/2503#issuecomment-446019069
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2455: [AIRFLOW-1423] Add logs to the scheduler DAG run decision logic

2018-12-10 Thread GitBox
stale[bot] commented on issue #2455: [AIRFLOW-1423] Add logs to the scheduler 
DAG run decision logic
URL: 
https://github.com/apache/incubator-airflow/pull/2455#issuecomment-446019079
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2811: [AIRFLOW-1844] task would not be executed when celery broker recovery

2018-12-10 Thread GitBox
stale[bot] commented on issue #2811: [AIRFLOW-1844] task would not be executed 
when celery broker recovery
URL: 
https://github.com/apache/incubator-airflow/pull/2811#issuecomment-446019011
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2757: [AIRFLOW-1775] Remote File Task Handler

2018-12-10 Thread GitBox
stale[bot] commented on issue #2757: [AIRFLOW-1775] Remote File Task Handler
URL: 
https://github.com/apache/incubator-airflow/pull/2757#issuecomment-446019036
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2192: [AIRFLOW-1044] base_task_runner fix for Task RUn Ignoring dependencies

2018-12-10 Thread GitBox
stale[bot] commented on issue #2192: [AIRFLOW-1044] base_task_runner fix for 
Task RUn Ignoring dependencies
URL: 
https://github.com/apache/incubator-airflow/pull/2192#issuecomment-446019139
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2740: [AIRFLOW-1768] add if to show trigger only if not paused

2018-12-10 Thread GitBox
stale[bot] commented on issue #2740: [AIRFLOW-1768] add if to show trigger only 
if not paused
URL: 
https://github.com/apache/incubator-airflow/pull/2740#issuecomment-446019037
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2794: [AIRFLOW-1822] Add gaiohttp and gthread gunicorn worker class to webserver cli

2018-12-10 Thread GitBox
stale[bot] commented on issue #2794: [AIRFLOW-1822] Add gaiohttp and gthread 
gunicorn worker class to webserver cli
URL: 
https://github.com/apache/incubator-airflow/pull/2794#issuecomment-446019012
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2754: [AIRFLOW-1729] Ignore whole directories from .airflowignore

2018-12-10 Thread GitBox
stale[bot] commented on issue #2754: [AIRFLOW-1729] Ignore whole directories 
from .airflowignore
URL: 
https://github.com/apache/incubator-airflow/pull/2754#issuecomment-446019020
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-3497) Allow for port configuration in KubernetesPodOperator

2018-12-10 Thread Wilson Lian (JIRA)
Wilson Lian created AIRFLOW-3497:


 Summary: Allow for port configuration in KubernetesPodOperator
 Key: AIRFLOW-3497
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3497
 Project: Apache Airflow
  Issue Type: Improvement
  Components: contrib
Reporter: Wilson Lian


Allowing for ports to be configured would enable cross-pod communication. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3496) Support multi-container pod in KubernetesPodOperator

2018-12-10 Thread Wilson Lian (JIRA)
Wilson Lian created AIRFLOW-3496:


 Summary: Support multi-container pod in KubernetesPodOperator
 Key: AIRFLOW-3496
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3496
 Project: Apache Airflow
  Issue Type: Improvement
  Components: contrib
Reporter: Wilson Lian


KubernetesPodOperator currently only allows one to run a single-container pod 
(aside from init containers). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3495) DataProcSparkSqlOperator and DataProcHiveOperator should raise error when query and query_uri are both provided

2018-12-10 Thread Wilson Lian (JIRA)
Wilson Lian created AIRFLOW-3495:


 Summary: DataProcSparkSqlOperator and DataProcHiveOperator should 
raise error when query and query_uri are both provided
 Key: AIRFLOW-3495
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3495
 Project: Apache Airflow
  Issue Type: Bug
  Components: contrib
Reporter: Wilson Lian


Exactly 1 of the query and query_uri params will be used. It should be an error 
to provide more than one. Fixing this will make cases like 
[this|https://stackoverflow.com/questions/53424091/unable-to-query-using-file-in-data-proc-hive-operator]
 less confusing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3494) Dask executor has dependency conflict

2018-12-10 Thread Colin Campbell (JIRA)
Colin Campbell created AIRFLOW-3494:
---

 Summary: Dask executor has dependency conflict
 Key: AIRFLOW-3494
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3494
 Project: Apache Airflow
  Issue Type: Bug
Affects Versions: 1.10.2
Reporter: Colin Campbell


In the setup.py, the dask extra installs 'distributed>=1.17.1,<2', and requires 
psutl <5. However, version 1.24.1 of dask-distributed now requires psutil >=5. 
[Distributed 
Changelog|https://distributed.readthedocs.io/en/latest/changelog.html#id3]

 

This causes an error when trying to install airflow without first manually 
installing distributed<=1.24.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3370) Enhance current ES handler with stdout capability and more output options

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715509#comment-16715509
 ] 

ASF GitHub Bot commented on AIRFLOW-3370:
-

rhwang10 opened a new pull request #4303: [AIRFLOW-3370] Elasticsearch log task 
handler additional features
URL: https://github.com/apache/incubator-airflow/pull/4303
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following 
[Airflow-3370](https://issues.apache.org/jira/browse/AIRFLOW-3370)
   
   ### Description
   
   - [x] Add additional options and documentation for using the 
ElasticsearchTaskHandler.
   
   The easiest and most foolproof way to implement a logging solution is to 
write to the standard output streams (stdout and stderr). In the Kubernetes 
ecosystem, given that pods are killed and restarted constantly, this implies 
persistent storage is a requirement for log history preservation.
   
   For the webserver and scheduler components of Airflow, logging to standard 
output streams is built in. However, when tasks are executed in Celery, workers 
will fork off child processes to execute tasks concurrently. Before the child 
process ends, it makes a call to the Airflow task logger, and a task log file 
is written to the file system.
   
   This potentially causes several problems with the Airflow on top of 
Kubernetes architecture. Given that Airflow has a constant stream of log 
output, running an Airflow environment using Celery in a Kubernetes cluster 
requires large amounts of memory resources. As such, when memory resources are 
exceeded, either 1) worker pods are often evicted by Kubernetes, or 2) worker 
output stalls and tasks pile up without completing. A current best-case 
workaround is to run a sidecar container that tails the `stdout` and `stderr` 
streams to read logs from the filesystem of the worker node and then output 
those logs to its own standard output. However, this becomes a scaling issue 
when running multiple deployments, and is unsustainable for massive Airflow 
deployments.
   
   The options that are added in this PR look to circumvent the need for 
persistent volumes in worker nodes when running Airflow on Kubernetes. Workers 
will no longer need to be Stateful Sets and can instead be Deployments. 
   
   The `elasticsearch_write_stdout` flag in `airflow.cfg` will allow the child 
process to write its log output to the parent process standard output stream.
   The `elasticsearch_json_format` flag in `airflow.cfg` allows additional 
optional JSON configuration for task instances based on the `logging` module 
LogRecord attributes. This must be used in conjunction with the 
`elasticsearch_record_labels` configuration. 
   
   A potential use case for these options is precisely when setting up a log 
monitoring stack, such as EFK (Elasticsearch FluentD Kibana), without requiring 
persistent volumes. A FluentD daemon listening on every node is awaiting log 
output on the standard output stream. With the `write_stdout` flag set, it can 
capture the task log information that has been executed on child processes from 
the parent process standard output. Using the `json_format` field, FluentD can 
be configured to filter and specify log records, without needing to parse the 
standard Airflow log formatted record, before sending it off to a destination, 
such as Elasticsearch, where it can stored and handled independent of Airflow 
on Kubernetes.
   
   If either of these options are NULL, or not set, the `es_task_handler.py` 
will function exactly as before, and will only have read functionality. The 
options in this PR simply provide more functionality and options to users.
   
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
    handler close
   - test_close_stdout_logs
   - test_close_closed_stdout_logs
   - test_close_no_mark_end_stdout_logs
   - test_close_with_no_handler_stdout_logs
   - test_close_with_no_stream_stdout_logs
   
    handler read
   - test_read_stdout_logs
   - test_read_nonexistent_log_stdout_logs
   - test_read_raises_stdout_logs_json_true
   - test_read_timeout_stdout_logs
   - test_read_with_empty_metadata_stdout_logs
   - test_read_with_none_metadata_stdout_logs
   
    handler render_log_id and set_context
   - test_render_log_id_json_true
   - test_set_context_stdout_logs
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. 

[jira] [Commented] (AIRFLOW-3370) Enhance current ES handler with stdout capability and more output options

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715508#comment-16715508
 ] 

ASF GitHub Bot commented on AIRFLOW-3370:
-

rhwang10 closed pull request #4303: [AIRFLOW-3370] Elasticsearch log task 
handler additional features
URL: https://github.com/apache/incubator-airflow/pull/4303
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/config_templates/airflow_local_settings.py 
b/airflow/config_templates/airflow_local_settings.py
index 45a2f2923c..2bf9f9d0e8 100644
--- a/airflow/config_templates/airflow_local_settings.py
+++ b/airflow/config_templates/airflow_local_settings.py
@@ -59,6 +59,13 @@
 
 END_OF_LOG_MARK = conf.get('elasticsearch', 'ELASTICSEARCH_END_OF_LOG_MARK')
 
+ELASTICSEARCH_WRITE_STDOUT = conf.get('elasticsearch', 
'ELASTICSEARCH_WRITE_STDOUT')
+
+ELASTICSEARCH_JSON_FORMAT = conf.get('elasticsearch', 
'ELASTICSEARCH_JSON_FORMAT')
+
+ELASTICSEARCH_RECORD_LABELS = [label.strip() for label in 
conf.get('elasticsearch',
+   'ELASTICSEARCH_RECORD_LABELS').split(",")]
+
 DEFAULT_LOGGING_CONFIG = {
 'version': 1,
 'disable_existing_loggers': False,
@@ -191,6 +198,9 @@
 'filename_template': FILENAME_TEMPLATE,
 'end_of_log_mark': END_OF_LOG_MARK,
 'host': ELASTICSEARCH_HOST,
+'write_stdout': ELASTICSEARCH_WRITE_STDOUT,
+'json_format': ELASTICSEARCH_JSON_FORMAT,
+'record_labels': ELASTICSEARCH_RECORD_LABELS,
 },
 },
 }
diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index a9473178c1..146192051d 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -588,6 +588,9 @@ hide_sensitive_variable_fields = True
 elasticsearch_host =
 elasticsearch_log_id_template = 
{{dag_id}}-{{task_id}}-{{execution_date}}-{{try_number}}
 elasticsearch_end_of_log_mark = end_of_log
+elasticsearch_write_stdout=
+elasticsearch_json_format=
+elasticsearch_record_labels=
 
 [kubernetes]
 # The repository, tag and imagePullPolicy of the Kubernetes Image for the 
Worker to Run
diff --git a/airflow/utils/log/es_task_handler.py 
b/airflow/utils/log/es_task_handler.py
index 16372c0600..e396504969 100644
--- a/airflow/utils/log/es_task_handler.py
+++ b/airflow/utils/log/es_task_handler.py
@@ -17,8 +17,12 @@
 # specific language governing permissions and limitations
 # under the License.
 
-# Using `from elasticsearch import *` would break elasticsearch mocking used 
in unit test.
+# Using `from elasticsearch import *` breaks es mocking in unit test.
 import elasticsearch
+import json
+import logging
+import sys
+
 import pendulum
 from elasticsearch_dsl import Search
 
@@ -28,6 +32,43 @@
 from airflow.utils.log.logging_mixin import LoggingMixin
 
 
+class ParentStdout():
+"""
+Keep track of the ParentStdout stdout context in child process
+"""
+def __init__(self):
+self.closed = False
+
+def write(self, string):
+sys.__stdout__.write(string)
+
+def close(self):
+self.closed = True
+
+
+class JsonFormatter(logging.Formatter):
+"""
+Custom formatter to allow for fields to be captured in JSON format.
+Fields are added via the RECORD_LABELS list.
+TODO: Move RECORD_LABELS into configs/log_config.py
+"""
+def __init__(self, record_labels, processedTask=None):
+super(JsonFormatter, self).__init__()
+self.processedTask = processedTask
+self.record_labels = record_labels
+
+def _mergeDictionaries(self, dict_1, dict_2):
+merged = dict_1.copy()
+merged.update(dict_2)
+return merged
+
+def format(self, record):
+recordObj = {label: getattr(record, label)
+ for label in self.record_labels}
+log_context = self._mergeDictionaries(recordObj, self.processedTask)
+return json.dumps(log_context)
+
+
 class ElasticsearchTaskHandler(FileTaskHandler, LoggingMixin):
 PAGE = 0
 MAX_LINE_PER_PAGE = 1000
@@ -50,7 +91,8 @@ class ElasticsearchTaskHandler(FileTaskHandler, LoggingMixin):
 
 def __init__(self, base_log_folder, filename_template,
  log_id_template, end_of_log_mark,
- host='localhost:9200'):
+ write_stdout=None, json_format=None,
+ record_labels=None, host='localhost:9200'):
 """
 :param base_log_folder: base folder to store logs locally
 :param log_id_template: log id template
@@ -58,7 +100,15 @@ def __init__(self, base_log_folder, filename_template,
 """
 

[GitHub] codecov-io edited a comment on issue #4295: AIRFLOW-3452 removed an unused/dangerous display-none

2018-12-10 Thread GitBox
codecov-io edited a comment on issue #4295: AIRFLOW-3452 removed an 
unused/dangerous display-none
URL: 
https://github.com/apache/incubator-airflow/pull/4295#issuecomment-445373465
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=h1)
 Report
   > Merging 
[#4295](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/788bd6fcb35ce6354ae874a508772e58a850683e?src=pr=desc)
 will **increase** coverage by `0.02%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/4295/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ##   master   #4295  +/-   ##
   =
   + Coverage   78.08%   78.1%   +0.02% 
   =
 Files 201 201  
 Lines   16462   16464   +2 
   =
   + Hits12854   12859   +5 
   + Misses   36083605   -3
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/4295/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5)
 | `64.59% <0%> (ø)` | :arrow_up: |
   | 
[airflow/security/kerberos.py](https://codecov.io/gh/apache/incubator-airflow/pull/4295/diff?src=pr=tree#diff-YWlyZmxvdy9zZWN1cml0eS9rZXJiZXJvcy5weQ==)
 | `76.19% <0%> (+4.76%)` | :arrow_up: |
   | 
[airflow/api/common/experimental/delete\_dag.py](https://codecov.io/gh/apache/incubator-airflow/pull/4295/diff?src=pr=tree#diff-YWlyZmxvdy9hcGkvY29tbW9uL2V4cGVyaW1lbnRhbC9kZWxldGVfZGFnLnB5)
 | `88% <0%> (+5.39%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=footer).
 Last update 
[788bd6f...63481e0](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #4295: AIRFLOW-3452 removed an unused/dangerous display-none

2018-12-10 Thread GitBox
codecov-io edited a comment on issue #4295: AIRFLOW-3452 removed an 
unused/dangerous display-none
URL: 
https://github.com/apache/incubator-airflow/pull/4295#issuecomment-445373465
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=h1)
 Report
   > Merging 
[#4295](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/788bd6fcb35ce6354ae874a508772e58a850683e?src=pr=desc)
 will **increase** coverage by `0.02%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/4295/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ##   master   #4295  +/-   ##
   =
   + Coverage   78.08%   78.1%   +0.02% 
   =
 Files 201 201  
 Lines   16462   16464   +2 
   =
   + Hits12854   12859   +5 
   + Misses   36083605   -3
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/4295/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5)
 | `64.59% <0%> (ø)` | :arrow_up: |
   | 
[airflow/security/kerberos.py](https://codecov.io/gh/apache/incubator-airflow/pull/4295/diff?src=pr=tree#diff-YWlyZmxvdy9zZWN1cml0eS9rZXJiZXJvcy5weQ==)
 | `76.19% <0%> (+4.76%)` | :arrow_up: |
   | 
[airflow/api/common/experimental/delete\_dag.py](https://codecov.io/gh/apache/incubator-airflow/pull/4295/diff?src=pr=tree#diff-YWlyZmxvdy9hcGkvY29tbW9uL2V4cGVyaW1lbnRhbC9kZWxldGVfZGFnLnB5)
 | `88% <0%> (+5.39%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=footer).
 Last update 
[788bd6f...63481e0](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rhwang10 closed pull request #4303: [AIRFLOW-3370] Elasticsearch log task handler additional features

2018-12-10 Thread GitBox
rhwang10 closed pull request #4303: [AIRFLOW-3370] Elasticsearch log task 
handler additional features
URL: https://github.com/apache/incubator-airflow/pull/4303
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/config_templates/airflow_local_settings.py 
b/airflow/config_templates/airflow_local_settings.py
index 45a2f2923c..2bf9f9d0e8 100644
--- a/airflow/config_templates/airflow_local_settings.py
+++ b/airflow/config_templates/airflow_local_settings.py
@@ -59,6 +59,13 @@
 
 END_OF_LOG_MARK = conf.get('elasticsearch', 'ELASTICSEARCH_END_OF_LOG_MARK')
 
+ELASTICSEARCH_WRITE_STDOUT = conf.get('elasticsearch', 
'ELASTICSEARCH_WRITE_STDOUT')
+
+ELASTICSEARCH_JSON_FORMAT = conf.get('elasticsearch', 
'ELASTICSEARCH_JSON_FORMAT')
+
+ELASTICSEARCH_RECORD_LABELS = [label.strip() for label in 
conf.get('elasticsearch',
+   'ELASTICSEARCH_RECORD_LABELS').split(",")]
+
 DEFAULT_LOGGING_CONFIG = {
 'version': 1,
 'disable_existing_loggers': False,
@@ -191,6 +198,9 @@
 'filename_template': FILENAME_TEMPLATE,
 'end_of_log_mark': END_OF_LOG_MARK,
 'host': ELASTICSEARCH_HOST,
+'write_stdout': ELASTICSEARCH_WRITE_STDOUT,
+'json_format': ELASTICSEARCH_JSON_FORMAT,
+'record_labels': ELASTICSEARCH_RECORD_LABELS,
 },
 },
 }
diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index a9473178c1..146192051d 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -588,6 +588,9 @@ hide_sensitive_variable_fields = True
 elasticsearch_host =
 elasticsearch_log_id_template = 
{{dag_id}}-{{task_id}}-{{execution_date}}-{{try_number}}
 elasticsearch_end_of_log_mark = end_of_log
+elasticsearch_write_stdout=
+elasticsearch_json_format=
+elasticsearch_record_labels=
 
 [kubernetes]
 # The repository, tag and imagePullPolicy of the Kubernetes Image for the 
Worker to Run
diff --git a/airflow/utils/log/es_task_handler.py 
b/airflow/utils/log/es_task_handler.py
index 16372c0600..e396504969 100644
--- a/airflow/utils/log/es_task_handler.py
+++ b/airflow/utils/log/es_task_handler.py
@@ -17,8 +17,12 @@
 # specific language governing permissions and limitations
 # under the License.
 
-# Using `from elasticsearch import *` would break elasticsearch mocking used 
in unit test.
+# Using `from elasticsearch import *` breaks es mocking in unit test.
 import elasticsearch
+import json
+import logging
+import sys
+
 import pendulum
 from elasticsearch_dsl import Search
 
@@ -28,6 +32,43 @@
 from airflow.utils.log.logging_mixin import LoggingMixin
 
 
+class ParentStdout():
+"""
+Keep track of the ParentStdout stdout context in child process
+"""
+def __init__(self):
+self.closed = False
+
+def write(self, string):
+sys.__stdout__.write(string)
+
+def close(self):
+self.closed = True
+
+
+class JsonFormatter(logging.Formatter):
+"""
+Custom formatter to allow for fields to be captured in JSON format.
+Fields are added via the RECORD_LABELS list.
+TODO: Move RECORD_LABELS into configs/log_config.py
+"""
+def __init__(self, record_labels, processedTask=None):
+super(JsonFormatter, self).__init__()
+self.processedTask = processedTask
+self.record_labels = record_labels
+
+def _mergeDictionaries(self, dict_1, dict_2):
+merged = dict_1.copy()
+merged.update(dict_2)
+return merged
+
+def format(self, record):
+recordObj = {label: getattr(record, label)
+ for label in self.record_labels}
+log_context = self._mergeDictionaries(recordObj, self.processedTask)
+return json.dumps(log_context)
+
+
 class ElasticsearchTaskHandler(FileTaskHandler, LoggingMixin):
 PAGE = 0
 MAX_LINE_PER_PAGE = 1000
@@ -50,7 +91,8 @@ class ElasticsearchTaskHandler(FileTaskHandler, LoggingMixin):
 
 def __init__(self, base_log_folder, filename_template,
  log_id_template, end_of_log_mark,
- host='localhost:9200'):
+ write_stdout=None, json_format=None,
+ record_labels=None, host='localhost:9200'):
 """
 :param base_log_folder: base folder to store logs locally
 :param log_id_template: log id template
@@ -58,7 +100,15 @@ def __init__(self, base_log_folder, filename_template,
 """
 super(ElasticsearchTaskHandler, self).__init__(
 base_log_folder, filename_template)
+
 self.closed = False
+self.write_stdout = write_stdout
+self.json_format = json_format
+self.record_labels = record_labels
+
+  

[GitHub] rhwang10 opened a new pull request #4303: [AIRFLOW-3370] Elasticsearch log task handler additional features

2018-12-10 Thread GitBox
rhwang10 opened a new pull request #4303: [AIRFLOW-3370] Elasticsearch log task 
handler additional features
URL: https://github.com/apache/incubator-airflow/pull/4303
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following 
[Airflow-3370](https://issues.apache.org/jira/browse/AIRFLOW-3370)
   
   ### Description
   
   - [x] Add additional options and documentation for using the 
ElasticsearchTaskHandler.
   
   The easiest and most foolproof way to implement a logging solution is to 
write to the standard output streams (stdout and stderr). In the Kubernetes 
ecosystem, given that pods are killed and restarted constantly, this implies 
persistent storage is a requirement for log history preservation.
   
   For the webserver and scheduler components of Airflow, logging to standard 
output streams is built in. However, when tasks are executed in Celery, workers 
will fork off child processes to execute tasks concurrently. Before the child 
process ends, it makes a call to the Airflow task logger, and a task log file 
is written to the file system.
   
   This potentially causes several problems with the Airflow on top of 
Kubernetes architecture. Given that Airflow has a constant stream of log 
output, running an Airflow environment using Celery in a Kubernetes cluster 
requires large amounts of memory resources. As such, when memory resources are 
exceeded, either 1) worker pods are often evicted by Kubernetes, or 2) worker 
output stalls and tasks pile up without completing. A current best-case 
workaround is to run a sidecar container that tails the `stdout` and `stderr` 
streams to read logs from the filesystem of the worker node and then output 
those logs to its own standard output. However, this becomes a scaling issue 
when running multiple deployments, and is unsustainable for massive Airflow 
deployments.
   
   The options that are added in this PR look to circumvent the need for 
persistent volumes in worker nodes when running Airflow on Kubernetes. Workers 
will no longer need to be Stateful Sets and can instead be Deployments. 
   
   The `elasticsearch_write_stdout` flag in `airflow.cfg` will allow the child 
process to write its log output to the parent process standard output stream.
   The `elasticsearch_json_format` flag in `airflow.cfg` allows additional 
optional JSON configuration for task instances based on the `logging` module 
LogRecord attributes. This must be used in conjunction with the 
`elasticsearch_record_labels` configuration. 
   
   A potential use case for these options is precisely when setting up a log 
monitoring stack, such as EFK (Elasticsearch FluentD Kibana), without requiring 
persistent volumes. A FluentD daemon listening on every node is awaiting log 
output on the standard output stream. With the `write_stdout` flag set, it can 
capture the task log information that has been executed on child processes from 
the parent process standard output. Using the `json_format` field, FluentD can 
be configured to filter and specify log records, without needing to parse the 
standard Airflow log formatted record, before sending it off to a destination, 
such as Elasticsearch, where it can stored and handled independent of Airflow 
on Kubernetes.
   
   If either of these options are NULL, or not set, the `es_task_handler.py` 
will function exactly as before, and will only have read functionality. The 
options in this PR simply provide more functionality and options to users.
   
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
    handler close
   - test_close_stdout_logs
   - test_close_closed_stdout_logs
   - test_close_no_mark_end_stdout_logs
   - test_close_with_no_handler_stdout_logs
   - test_close_with_no_stream_stdout_logs
   
    handler read
   - test_read_stdout_logs
   - test_read_nonexistent_log_stdout_logs
   - test_read_raises_stdout_logs_json_true
   - test_read_timeout_stdout_logs
   - test_read_with_empty_metadata_stdout_logs
   - test_read_with_none_metadata_stdout_logs
   
    handler render_log_id and set_context
   - test_render_log_id_json_true
   - test_set_context_stdout_logs
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that 

[jira] [Created] (AIRFLOW-3493) DAG doc_md only shows up on Graph View

2018-12-10 Thread James Meickle (JIRA)
James Meickle created AIRFLOW-3493:
--

 Summary: DAG doc_md only shows up on Graph View
 Key: AIRFLOW-3493
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3493
 Project: Apache Airflow
  Issue Type: Improvement
  Components: ui
Affects Versions: 1.10.1
Reporter: James Meickle


Currently, setting doc_md on a DAG causes rendered documentation to appear in 
Graph View. This is a great feature, but it could be applied more widely.

Out of the current DAG-level tabs, I would also expect DAG documentation to 
show up on Tree View (it's similar to Graph) and on Details (it has info on the 
overall DAG).

I would not expect it to show up on the other DAG-level pages, since they are 
interactive querying displays.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1585) Documentation (e.g. doc_md) for tasks should be displayed more prominently

2018-12-10 Thread James Meickle (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715493#comment-16715493
 ] 

James Meickle commented on AIRFLOW-1585:


+1 on this, I'd love a short snippet to appear on hover or when you click on a 
task instance.

> Documentation (e.g. doc_md) for tasks should be displayed more prominently 
> ---
>
> Key: AIRFLOW-1585
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1585
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Affects Versions: 1.8.0
>Reporter: Max Grender-Jones
>Priority: Major
>
> We have a web of complex tasks. I love that when I set `dag_md` on an entire 
> DAG it is visible very prominently on the Graph View. However, the same 
> cannot be said of `dag_md` set on individual tasks (until now I've never even 
> had a use for the `Task Details` tab, even there it's not very prominently 
> presented).
> My wishlist of where it would be displayed:
>  - In the mouseover (although if there's a long description, perhaps that 
> would be too much for here)
>  - In the popup that you see when you click on a task (My favourite)
>  - At the top of *all* the tabs for that task, displayed in a prominent 
> break-out fashion (as with the DAG), as opposed to buried in the 'attributes' 
> section



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] stale[bot] commented on issue #3782: [AIRFLOW-2936] Use official Python images as base image for Docker

2018-12-10 Thread GitBox
stale[bot] commented on issue #3782: [AIRFLOW-2936] Use official Python images 
as base image for Docker
URL: 
https://github.com/apache/incubator-airflow/pull/3782#issuecomment-445945137
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3526: [AIRFLOW-2651] Add file system hooks with a common interface

2018-12-10 Thread GitBox
stale[bot] commented on issue #3526: [AIRFLOW-2651] Add file system hooks with 
a common interface
URL: 
https://github.com/apache/incubator-airflow/pull/3526#issuecomment-445945168
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3447: [AIRFLOW-2549] Fix DataProcOperation error-check

2018-12-10 Thread GitBox
stale[bot] commented on issue #3447: [AIRFLOW-2549] Fix DataProcOperation 
error-check
URL: 
https://github.com/apache/incubator-airflow/pull/3447#issuecomment-445945190
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3186: [AIRFLOW-2280]Add feature in CheckIntervalOperator

2018-12-10 Thread GitBox
stale[bot] commented on issue #3186: [AIRFLOW-2280]Add feature in 
CheckIntervalOperator
URL: 
https://github.com/apache/incubator-airflow/pull/3186#issuecomment-445945212
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3713: [AIRFLOW-2428] Add AutoScalingRole key to emr_hook

2018-12-10 Thread GitBox
stale[bot] commented on issue #3713: [AIRFLOW-2428] Add AutoScalingRole key to 
emr_hook
URL: 
https://github.com/apache/incubator-airflow/pull/3713#issuecomment-445945156
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3139: [AIRFLOW-2224] Add support for CSV files in mysql_to_gcs operator

2018-12-10 Thread GitBox
stale[bot] commented on issue #3139: [AIRFLOW-2224] Add support for CSV files 
in mysql_to_gcs operator
URL: 
https://github.com/apache/incubator-airflow/pull/3139#issuecomment-445945232
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2932: [AIRFLOW-1974] Improve Databricks Hook/Operator

2018-12-10 Thread GitBox
stale[bot] commented on issue #2932: [AIRFLOW-1974] Improve Databricks 
Hook/Operator
URL: 
https://github.com/apache/incubator-airflow/pull/2932#issuecomment-445945256
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #2881: Correct http_hook to use conn_type instead of schema

2018-12-10 Thread GitBox
stale[bot] commented on issue #2881: Correct http_hook to use conn_type instead 
of schema
URL: 
https://github.com/apache/incubator-airflow/pull/2881#issuecomment-445945261
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3134: [AIRFLOW-878] Use absolute gunicorn executable location

2018-12-10 Thread GitBox
stale[bot] commented on issue #3134: [AIRFLOW-878] Use absolute gunicorn 
executable location
URL: 
https://github.com/apache/incubator-airflow/pull/3134#issuecomment-445945239
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3786: [AIRFLOW-XXX] Note on min_file_process_interval

2018-12-10 Thread GitBox
stale[bot] commented on issue #3786: [AIRFLOW-XXX] Note on 
min_file_process_interval
URL: 
https://github.com/apache/incubator-airflow/pull/3786#issuecomment-445945129
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3816: [HOLD][AIRFLOW-2973] Use Python 3.6.x everywhere possible

2018-12-10 Thread GitBox
stale[bot] commented on issue #3816: [HOLD][AIRFLOW-2973] Use Python 3.6.x 
everywhere possible
URL: 
https://github.com/apache/incubator-airflow/pull/3816#issuecomment-445945110
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3115: [AIRFLOW-2193] Add ROperator for using R

2018-12-10 Thread GitBox
stale[bot] commented on issue #3115: [AIRFLOW-2193] Add ROperator for using R
URL: 
https://github.com/apache/incubator-airflow/pull/3115#issuecomment-445945249
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3702: [AIRFLOW-81] Add ScheduleBlackoutSensor

2018-12-10 Thread GitBox
stale[bot] commented on issue #3702: [AIRFLOW-81] Add ScheduleBlackoutSensor
URL: 
https://github.com/apache/incubator-airflow/pull/3702#issuecomment-445945162
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3912: [AIRFLOW-3077] Default Not to Raise Error When PyMongo Contruct JSON Data

2018-12-10 Thread GitBox
stale[bot] commented on issue #3912: [AIRFLOW-3077] Default Not to Raise Error 
When PyMongo Contruct JSON Data
URL: 
https://github.com/apache/incubator-airflow/pull/3912#issuecomment-445945088
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3858: [AIRFLOW-2929] Add get and set for pool class in models

2018-12-10 Thread GitBox
stale[bot] commented on issue #3858: [AIRFLOW-2929] Add get and set for pool 
class in models
URL: 
https://github.com/apache/incubator-airflow/pull/3858#issuecomment-445945100
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3941: [AIRFLOW-3106] Validate Postgres connection after saving it

2018-12-10 Thread GitBox
stale[bot] commented on issue #3941: [AIRFLOW-3106] Validate Postgres 
connection after saving it
URL: 
https://github.com/apache/incubator-airflow/pull/3941#issuecomment-445945072
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3249: [AIRFLOW-2354] Change task instance run validation to not exclude das…

2018-12-10 Thread GitBox
stale[bot] commented on issue #3249: [AIRFLOW-2354] Change task instance run 
validation to not exclude das…
URL: 
https://github.com/apache/incubator-airflow/pull/3249#issuecomment-445945197
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3722: [AIRFLOW-2759] Add changes to extract proxy details at the base hook …

2018-12-10 Thread GitBox
stale[bot] commented on issue #3722: [AIRFLOW-2759] Add changes to extract 
proxy details at the base hook …
URL: 
https://github.com/apache/incubator-airflow/pull/3722#issuecomment-445945151
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3503: Misc fixes

2018-12-10 Thread GitBox
stale[bot] commented on issue #3503: Misc fixes
URL: 
https://github.com/apache/incubator-airflow/pull/3503#issuecomment-445945173
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3182: Testing oracle-java9-installer for travisci builds

2018-12-10 Thread GitBox
stale[bot] commented on issue #3182: Testing oracle-java9-installer for 
travisci builds
URL: 
https://github.com/apache/incubator-airflow/pull/3182#issuecomment-445945233
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3805: [AIRFLOW-2062] Add per-connection KMS encryption.

2018-12-10 Thread GitBox
stale[bot] commented on issue #3805: [AIRFLOW-2062] Add per-connection KMS 
encryption.
URL: 
https://github.com/apache/incubator-airflow/pull/3805#issuecomment-445945118
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3229: [AIRFLOW-2325] Add cloudwatch task handler (IN PROGRESS)

2018-12-10 Thread GitBox
stale[bot] commented on issue #3229: [AIRFLOW-2325] Add cloudwatch task handler 
(IN PROGRESS)
URL: 
https://github.com/apache/incubator-airflow/pull/3229#issuecomment-445945219
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3774: [AIRFLOW-2920] Added downward API metadata to Kubernetes pods

2018-12-10 Thread GitBox
stale[bot] commented on issue #3774: [AIRFLOW-2920] Added downward API metadata 
to Kubernetes pods
URL: 
https://github.com/apache/incubator-airflow/pull/3774#issuecomment-445945143
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3250: [WIP] Add dms raw

2018-12-10 Thread GitBox
stale[bot] commented on issue #3250: [WIP] Add dms raw
URL: 
https://github.com/apache/incubator-airflow/pull/3250#issuecomment-445945204
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3867: [AIRFLOW-3011][CLI] Add cmd function printing config

2018-12-10 Thread GitBox
stale[bot] commented on issue #3867: [AIRFLOW-3011][CLI] Add cmd function 
printing config
URL: 
https://github.com/apache/incubator-airflow/pull/3867#issuecomment-445945095
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3891: [AIRFLOW-3029] New Operator - SqlOperator

2018-12-10 Thread GitBox
stale[bot] commented on issue #3891: [AIRFLOW-3029] New Operator - SqlOperator
URL: 
https://github.com/apache/incubator-airflow/pull/3891#issuecomment-445945083
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3796: [AIRFLOW-2824] - Add config to disable default conn creation

2018-12-10 Thread GitBox
stale[bot] commented on issue #3796: [AIRFLOW-2824] - Add config to disable 
default conn creation
URL: 
https://github.com/apache/incubator-airflow/pull/3796#issuecomment-445945124
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #3851: AIRFLOW-3014 Increase possible length of passwords in connection table

2018-12-10 Thread GitBox
stale[bot] commented on issue #3851: AIRFLOW-3014 Increase possible length of 
passwords in connection table
URL: 
https://github.com/apache/incubator-airflow/pull/3851#issuecomment-445945115
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] stale[bot] commented on issue #4055: [AIRFLOW-3206] neutral and clear GPL dependency notice

2018-12-10 Thread GitBox
stale[bot] commented on issue #4055: [AIRFLOW-3206] neutral and clear GPL 
dependency notice
URL: 
https://github.com/apache/incubator-airflow/pull/4055#issuecomment-445945065
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] MarcusSorealheis edited a comment on issue #4295: AIRFLOW-3452 removed an unused/dangerous display-none

2018-12-10 Thread GitBox
MarcusSorealheis edited a comment on issue #4295: AIRFLOW-3452 removed an 
unused/dangerous display-none
URL: 
https://github.com/apache/incubator-airflow/pull/4295#issuecomment-445932870
 
 
   @ashb I have not identified any reason for 
https://github.com/apache/incubator-airflow/blob/master/airflow/www_rbac/templates/airflow/dags.html#L306
 other than they style but there may be another reason that I have not 
pinpointed yet. My vote would be to remove it when we know why it is there as 
the reason could extend beyond overwriting the `display: none`. The entire 
JavaScript section of the file there will be rewritten soon.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tal181 commented on issue #4267: [AIRFLOW-3411] create openfaas hook

2018-12-10 Thread GitBox
tal181 commented on issue #4267: [AIRFLOW-3411]  create openfaas hook
URL: 
https://github.com/apache/incubator-airflow/pull/4267#issuecomment-445943097
 
 
   ping @kaxil 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] apraovjr commented on issue #4290: [AIRFLOW-3282] Implement Azure Kubernetes Service Operator

2018-12-10 Thread GitBox
apraovjr commented on issue #4290: [AIRFLOW-3282] Implement Azure Kubernetes 
Service Operator 
URL: 
https://github.com/apache/incubator-airflow/pull/4290#issuecomment-445935100
 
 
   Reviewers?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] MarcusSorealheis commented on issue #4295: AIRFLOW-3452 removed an unused/dangerous display-none

2018-12-10 Thread GitBox
MarcusSorealheis commented on issue #4295: AIRFLOW-3452 removed an 
unused/dangerous display-none
URL: 
https://github.com/apache/incubator-airflow/pull/4295#issuecomment-445932870
 
 
   @ashb I have not identified any reason for 
https://github.com/apache/incubator-airflow/blob/master/airflow/www_rbac/templates/airflow/dags.html#L306
 other than they style but there may be another reason that I have not 
pinpointed yet. My vote would be to remove it. The entire JavaScript section of 
the file there will be rewritten soon.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rhwang10 opened a new pull request #4303: [AIRFLOW-3370] Elasticsearch log task handler additional features

2018-12-10 Thread GitBox
rhwang10 opened a new pull request #4303: [AIRFLOW-3370] Elasticsearch log task 
handler additional features
URL: https://github.com/apache/incubator-airflow/pull/4303
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following 
[Airflow-3370](https://issues.apache.org/jira/browse/AIRFLOW-3370)
   
   ### Description
   
   Add additional options and documentation for using the 
ElasticsearchTaskHandler.
   
   The easiest and most foolproof way to implement a logging solution is to 
write to the standard output streams (stdout and stderr). In the Kubernetes 
ecosystem, given that pods are killed and restarted constantly, this implies 
persistent storage is a requirement for log history preservation.
   
   For the webserver and scheduler components of Airflow, logging to standard 
output streams is built in. However, when tasks are executed in Celery, workers 
will fork off child processes to execute tasks concurrently. Before the child 
process ends, it makes a call to the Airflow task logger, and a task log file 
is written to the file system.
   
   This potentially causes several problems with the Airflow on top of 
Kubernetes architecture. Given that Airflow has a constant stream of log 
output, running an Airflow environment using Celery in a Kubernetes cluster 
requires large amounts of memory resources. As such, when memory resources are 
exceeded, either 1) worker pods are often evicted by Kubernetes, or 2) worker 
output stalls and tasks pile up without completing. A current best-case 
workaround is to run a sidecar container that tails the `stdout` and `stderr` 
streams to read logs from the filesystem of the worker node and then output 
those logs to its own standard output. However, this becomes a scaling issue 
when running multiple deployments, and is unsustainable for massive Airflow 
deployments.
   
   The options that are added in this PR look to circumvent the need for 
persistent volumes in worker nodes when running Airflow on Kubernetes. Workers 
will no longer need to be Stateful Sets and can instead be Deployments. 
   
   The `elasticsearch_write_stdout` flag in `airflow.cfg` will allow the child 
process to write its log output to the parent process standard output stream.
   The `elasticsearch_json_format` flag in `airflow.cfg` allows additional 
optional JSON configuration for task instances based on the `logging` module 
LogRecord attributes. This must be used in conjunction with the 
`elasticsearch_record_labels` configuration. 
   
   A potential use case for these options is precisely when setting up a log 
monitoring stack, such as EFK (Elasticsearch FluentD Kibana), without requiring 
persistent volumes. A FluentD daemon listening on every node is awaiting log 
output on the standard output stream. With the `write_stdout` flag set, it can 
capture the task log information that has been executed on child processes from 
the parent process standard output. Using the `json_format` field, FluentD can 
be configured to filter and specify log records, without needing to parse the 
standard Airflow log formatted record, before sending it off to a destination, 
such as Elasticsearch, where it can stored and handled independent of Airflow 
on Kubernetes.
   
   If either of these options are NULL, or not set, the `es_task_handler.py` 
will function exactly as before, and will only have read functionality. The 
options in this PR simply provide more functionality and options to users.
   
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
    handler close
   - test_close_stdout_logs
   - test_close_closed_stdout_logs
   - test_close_no_mark_end_stdout_logs
   - test_close_with_no_handler_stdout_logs
   - test_close_with_no_stream_stdout_logs
   
    handler read
   - test_read_stdout_logs
   - test_read_nonexistent_log_stdout_logs
   - test_read_raises_stdout_logs_json_true
   - test_read_timeout_stdout_logs
   - test_read_with_empty_metadata_stdout_logs
   - test_read_with_none_metadata_stdout_logs
   
    handler render_log_id and set_context
   - test_render_log_id_json_true
   - test_set_context_stdout_logs
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 

[jira] [Commented] (AIRFLOW-3370) Enhance current ES handler with stdout capability and more output options

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715356#comment-16715356
 ] 

ASF GitHub Bot commented on AIRFLOW-3370:
-

rhwang10 opened a new pull request #4303: [AIRFLOW-3370] Elasticsearch log task 
handler additional features
URL: https://github.com/apache/incubator-airflow/pull/4303
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following 
[Airflow-3370](https://issues.apache.org/jira/browse/AIRFLOW-3370)
   
   ### Description
   
   Add additional options and documentation for using the 
ElasticsearchTaskHandler.
   
   The easiest and most foolproof way to implement a logging solution is to 
write to the standard output streams (stdout and stderr). In the Kubernetes 
ecosystem, given that pods are killed and restarted constantly, this implies 
persistent storage is a requirement for log history preservation.
   
   For the webserver and scheduler components of Airflow, logging to standard 
output streams is built in. However, when tasks are executed in Celery, workers 
will fork off child processes to execute tasks concurrently. Before the child 
process ends, it makes a call to the Airflow task logger, and a task log file 
is written to the file system.
   
   This potentially causes several problems with the Airflow on top of 
Kubernetes architecture. Given that Airflow has a constant stream of log 
output, running an Airflow environment using Celery in a Kubernetes cluster 
requires large amounts of memory resources. As such, when memory resources are 
exceeded, either 1) worker pods are often evicted by Kubernetes, or 2) worker 
output stalls and tasks pile up without completing. A current best-case 
workaround is to run a sidecar container that tails the `stdout` and `stderr` 
streams to read logs from the filesystem of the worker node and then output 
those logs to its own standard output. However, this becomes a scaling issue 
when running multiple deployments, and is unsustainable for massive Airflow 
deployments.
   
   The options that are added in this PR look to circumvent the need for 
persistent volumes in worker nodes when running Airflow on Kubernetes. Workers 
will no longer need to be Stateful Sets and can instead be Deployments. 
   
   The `elasticsearch_write_stdout` flag in `airflow.cfg` will allow the child 
process to write its log output to the parent process standard output stream.
   The `elasticsearch_json_format` flag in `airflow.cfg` allows additional 
optional JSON configuration for task instances based on the `logging` module 
LogRecord attributes. This must be used in conjunction with the 
`elasticsearch_record_labels` configuration. 
   
   A potential use case for these options is precisely when setting up a log 
monitoring stack, such as EFK (Elasticsearch FluentD Kibana), without requiring 
persistent volumes. A FluentD daemon listening on every node is awaiting log 
output on the standard output stream. With the `write_stdout` flag set, it can 
capture the task log information that has been executed on child processes from 
the parent process standard output. Using the `json_format` field, FluentD can 
be configured to filter and specify log records, without needing to parse the 
standard Airflow log formatted record, before sending it off to a destination, 
such as Elasticsearch, where it can stored and handled independent of Airflow 
on Kubernetes.
   
   If either of these options are NULL, or not set, the `es_task_handler.py` 
will function exactly as before, and will only have read functionality. The 
options in this PR simply provide more functionality and options to users.
   
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
    handler close
   - test_close_stdout_logs
   - test_close_closed_stdout_logs
   - test_close_no_mark_end_stdout_logs
   - test_close_with_no_handler_stdout_logs
   - test_close_with_no_stream_stdout_logs
   
    handler read
   - test_read_stdout_logs
   - test_read_nonexistent_log_stdout_logs
   - test_read_raises_stdout_logs_json_true
   - test_read_timeout_stdout_logs
   - test_read_with_empty_metadata_stdout_logs
   - test_read_with_none_metadata_stdout_logs
   
    handler render_log_id and set_context
   - test_render_log_id_json_true
   - test_set_context_stdout_logs
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. 

[GitHub] Fokko commented on issue #4298: [AIRFLOW-3478] Make sure that the session is closed

2018-12-10 Thread GitBox
Fokko commented on issue #4298: [AIRFLOW-3478] Make sure that the session is 
closed
URL: 
https://github.com/apache/incubator-airflow/pull/4298#issuecomment-445925323
 
 
   @kaxil Thanks for the Flake8 fix :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on a change in pull request #4298: [AIRFLOW-3478] Make sure that the session is closed

2018-12-10 Thread GitBox
Fokko commented on a change in pull request #4298: [AIRFLOW-3478] Make sure 
that the session is closed
URL: https://github.com/apache/incubator-airflow/pull/4298#discussion_r240330093
 
 

 ##
 File path: airflow/bin/cli.py
 ##
 @@ -423,14 +418,11 @@ def unpause(args, dag=None):
 def set_is_paused(is_paused, args, dag=None):
 dag = dag or get_dag(args)
 
-session = settings.Session()
-dm = session.query(DagModel).filter(
-DagModel.dag_id == dag.dag_id).first()
-dm.is_paused = is_paused
-session.commit()
 
 Review comment:
   So we should not remove any `.commit()` without consideration, but after the 
`create_session`, or `@provide_session` the session is committed anyway. 
Therefore the explicit `.commit()` in the code will only act as an in-between 
commit. We should close the session if we don't use it again directly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on a change in pull request #4298: [AIRFLOW-3478] Make sure that the session is closed

2018-12-10 Thread GitBox
Fokko commented on a change in pull request #4298: [AIRFLOW-3478] Make sure 
that the session is closed
URL: https://github.com/apache/incubator-airflow/pull/4298#discussion_r240329624
 
 

 ##
 File path: airflow/api/common/experimental/mark_tasks.py
 ##
 @@ -180,18 +180,15 @@ def set_state(task, execution_date, upstream=False, 
downstream=False,
 tis_altered += qry_sub_dag.with_for_update().all()
 for ti in tis_altered:
 ti.state = state
-session.commit()
 else:
 tis_altered = qry_dag.all()
 if len(sub_dag_ids) > 0:
 tis_altered += qry_sub_dag.all()
 
-session.expunge_all()
 
 Review comment:
   I had to dig deeper into this. By default, [the objects are cleaned 
up](https://docs.sqlalchemy.org/en/latest/orm/session_api.html#sqlalchemy.orm.session.Session.commit):
 By default, the `Session` also expires all database loaded state on all 
ORM-managed attributes after transaction commit. This so that subsequent 
operations load the most recent data from the database. This behavior can be 
disabled using the `expire_on_commit=False` option to sessionmaker or the 
`Session`constructor. 
   
   But I've noticed that we explicitly set the `expire_on_commit=False`.
   
https://github.com/apache/incubator-airflow/blob/ded25e16c1fb912019d3d0e5d47d020dccaa54b7/airflow/settings.py#L198
   
   In this case this change would indeed change behaviour. Maybe remove the 
`expire_on_commit ` to make it simpeler?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #3547: [AIRFLOW-2659] Improve Robustness of Operators in Airflow during Infra Outages

2018-12-10 Thread GitBox
Fokko commented on issue #3547: [AIRFLOW-2659] Improve Robustness of Operators 
in Airflow during Infra Outages
URL: 
https://github.com/apache/incubator-airflow/pull/3547#issuecomment-445899544
 
 
   Wouldn't it be preferable to design the operator in such a way, that it will 
pick it up again automagically. Now we add additional configuration etc, which 
increases the complexity of Airflow and the specific operators. Would it be 
possible to integrate this in the operator itself?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] dlamblin commented on a change in pull request #4163: [AIRFLOW-3319] - KubernetsExecutor: Need in try_number in labels if getting them later

2018-12-10 Thread GitBox
dlamblin commented on a change in pull request #4163: [AIRFLOW-3319] - 
KubernetsExecutor: Need in try_number in  labels if getting them later
URL: https://github.com/apache/incubator-airflow/pull/4163#discussion_r240277825
 
 

 ##
 File path: airflow/contrib/executors/kubernetes_executor.py
 ##
 @@ -458,10 +459,16 @@ def _datetime_to_label_safe_datestring(datetime_obj):
 
 def _labels_to_key(self, labels):
 try:
+try_num = 1
+try:
+try_num = int(labels.get('try_number', '1'))
+except ValueError:
+self.log.warn("could not get try_number as an int: %s", 
labels.get('try_number', '1'))
 return (
 labels['dag_id'], labels['task_id'],
 
self._label_safe_datestring_to_datetime(labels['execution_date']),
-labels['try_number'])
+try_num
 
 Review comment:
   So, if you upgrade a live system any pods without the label will return 
`try_num` 1. That's the crux of the fix (along with adding the try num label to 
new pods made), right?
   
   Would be a spot for a pep8 recommended (plays nice with vcs) redundant comma
   ```suggestion
   try_num,
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] dlamblin commented on a change in pull request #4163: [AIRFLOW-3319] - KubernetsExecutor: Need in try_number in labels if getting them later

2018-12-10 Thread GitBox
dlamblin commented on a change in pull request #4163: [AIRFLOW-3319] - 
KubernetsExecutor: Need in try_number in  labels if getting them later
URL: https://github.com/apache/incubator-airflow/pull/4163#discussion_r240275773
 
 

 ##
 File path: airflow/contrib/executors/kubernetes_executor.py
 ##
 @@ -346,6 +346,7 @@ def run_next(self, next_job):
 namespace=self.namespace, worker_uuid=self.worker_uuid,
 
 Review comment:
   This is the only call to make_pod? If so, then good.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] dlamblin commented on a change in pull request #4163: [AIRFLOW-3319] - KubernetsExecutor: Need in try_number in labels if getting them later

2018-12-10 Thread GitBox
dlamblin commented on a change in pull request #4163: [AIRFLOW-3319] - 
KubernetsExecutor: Need in try_number in  labels if getting them later
URL: https://github.com/apache/incubator-airflow/pull/4163#discussion_r240276619
 
 

 ##
 File path: airflow/contrib/executors/kubernetes_executor.py
 ##
 @@ -458,10 +459,16 @@ def _datetime_to_label_safe_datestring(datetime_obj):
 
 def _labels_to_key(self, labels):
 try:
+try_num = 1
 
 Review comment:
   Can this block be moved before the try on 461? The try in the try doesn't 
read well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


  1   2   >