[GitHub] [airflow] potiuk opened a new pull request #7194: [AIRFLOW=6584] Pin cassandra driver

2020-01-16 Thread GitBox
potiuk opened a new pull request #7194: [AIRFLOW=6584] Pin cassandra driver
URL: https://github.com/apache/airflow/pull/7194
 
 
   3.21.0 release of Cassandra driver
   (https://pypi.org/project/cassandra-driver/3.21.0/) broke backwards
   compatibility. We need to pin it to 3.20.2
   
   ---
   Issue link: WILL BE INSERTED BY 
[boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Commit message/PR title starts with `[AIRFLOW-]`. AIRFLOW- = 
JIRA ID*
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Commits follow "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow 
Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   * For document-only changes commit message can start with 
`[AIRFLOW-]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)
 for more information.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-6584) Pin cassandra driver

2020-01-16 Thread Jarek Potiuk (Jira)
Jarek Potiuk created AIRFLOW-6584:
-

 Summary: Pin cassandra driver
 Key: AIRFLOW-6584
 URL: https://issues.apache.org/jira/browse/AIRFLOW-6584
 Project: Apache Airflow
  Issue Type: Improvement
  Components: ci
Affects Versions: 2.0.0
Reporter: Jarek Potiuk


3.21.0 release of Cassandra driver 
([https://pypi.org/project/cassandra-driver/3.21.0/]) broke backwards 
compatibility. We need to pin it to 3.20.2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

2020-01-16 Thread GitBox
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add 
log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r367788223
 
 

 ##
 File path: airflow/utils/log/es_task_handler.py
 ##
 @@ -255,7 +256,9 @@ def close(self):
 
 # Mark the end of file using end of log mark,
 # so we know where to stop while auto-tailing.
-self.handler.stream.write(self.end_of_log_mark)
+if self.write_stdout:
+print()
+self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   Kevin, the removal of end-of-log mark is already handled by Andrii's initial 
changes for the stdout support so it won't be displayed to the users. Pleas see 
https://github.com/apache/airflow/commit/0da976a0e1e28e2c0cd274d7384cf2976db6deec#diff-485751b55125e8a90050d22f69e8467c
   
   
   # end_of_log_mark may contain characters like '\n' which is needed to
   # have the log uploaded but will not be stored in elasticsearch.
   metadata['end_of_log'] = False if not logs \
   else logs[-1].message == self.end_of_log_mark.strip()
   
   then
   
   # If we hit the end of the log, remove the actual end_of_log message
   # to prevent it from showing in the UI.
   i = len(logs) if not metadata['end_of_log'] else len(logs) - 1
   message = '\n'.join([log.message for log in logs[0:i]])
   
   Please see my test case test_close_with_log_id that exercises this logic in 
the tests now.
   Can you please check if this is clear to you now?
   
   Log_id is constructed on the Elasticsearch but it needs the dag_id, task_id, 
execution_date and try_number to compute the log_id and that is why  you need 
to use emit() to include the information. In my test case, here is how I 
simulate the logic in the elastic search processors:
   
   msg['log_id'] = self.log_id_template.format(
   dag_id=msg['dag_id'],
   task_id=msg['task_id'],
   execution_date=msg['execution_date'],
   try_number=msg['try_number'])
   msg['message'] = msg['message'].strip()
   msg['offset'] = 100
   
   To do the same, the elastic search ingest processor pipeline looks like the 
following for me:
   
   
   
   "description" : "cluster json log Pipeline",
   "processors" : [
 {
   "rename" : {
 "field" : "message",
 "target_field" : "raw_message"
   }
 },
 {
   "json" : {
 "field" : "raw_message",
 "add_to_root" : false,
 "target_field" : "json_target"
   }
 },
 {
   "grok" : {
 "field" : "json_target.message",
 "patterns" : [
   "Job %{DATA:job_id}: Subtask %{DATA} %{GREEDYDATA:json_msg}",
   "%{GREEDYDATA}"
 ]
   }
 },
 {
   "json" : {
 "field" : "json_msg",
 "add_to_root" : true,
 "if" : "ctx.job_id != null"
   }
 },
 {
   "json" : {
 "field" : "raw_message",
 "add_to_root" : true,
 "if" : "ctx.job_id == null"
   }
 },
 {
   "remove" : {
 "field" : "json_msg",
 "ignore_missing" : true
   }
 },
 {
   "remove" : {
 "field" : "json_target"
   }
 },
 {
   "set" : {
 "field" : "event.kind",
 "value" : "tasks",
 "if" : "ctx.message != null"
   }
 },
 {
   "set" : {
 "field" : "event.dataset",
 "value" : "airflow",
 "if" : "ctx.dag_id != null && ctx.task_id != null"
   }
 },
 {
   "set" : {
 "field" : "log_id",
 "value" : 
"{{dag_id}}-{{task_id}}-{{execution_date}}-{{try_number}}",
 "if" : "ctx.event?.dataset == 'airflow'"
   }
 },
 {
   "set" : {
 "field" : "offset",
 "value" : "{{log.offset}}",
 "if" : "ctx.event?.dataset == 'airflow'"
   }
 }
   ],
   "on_failure" : [
 {
   "set" : {
 "field" : "error.message",
 "value" : "{{ _ingest.on_failure_message }}"
   }
 }
   ]
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

2020-01-16 Thread GitBox
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add 
log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r367788223
 
 

 ##
 File path: airflow/utils/log/es_task_handler.py
 ##
 @@ -255,7 +256,9 @@ def close(self):
 
 # Mark the end of file using end of log mark,
 # so we know where to stop while auto-tailing.
-self.handler.stream.write(self.end_of_log_mark)
+if self.write_stdout:
+print()
+self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   Kevin, the removal of end-of-log mark is already handled by Andrii's initial 
changes for the stdout support so it won't be displayed to the users. Pleas see 
https://github.com/apache/airflow/commit/0da976a0e1e28e2c0cd274d7384cf2976db6deec#diff-485751b55125e8a90050d22f69e8467c
   
   
   # end_of_log_mark may contain characters like '\n' which is needed to
   # have the log uploaded but will not be stored in elasticsearch.
   metadata['end_of_log'] = False if not logs \
   else logs[-1].message == self.end_of_log_mark.strip()
   
   then
   
   # If we hit the end of the log, remove the actual end_of_log message
   # to prevent it from showing in the UI.
   i = len(logs) if not metadata['end_of_log'] else len(logs) - 1
   message = '\n'.join([log.message for log in logs[0:i]])
   
   Please see my test case test_close_with_log_id that exercises this logic in 
the tests now.
   Can you please check if this is clear to you now?
   
   Log_id is constructed on the Elasticsearch but it needs the dag_id, task_id, 
execution_date and try_number to compute the log_id and that is why  you need 
to use emit() to include the information. In my test case, here is how I 
simulate the logic in the elastic search processors:
   msg['log_id'] = self.log_id_template.format(
   dag_id=msg['dag_id'],
   task_id=msg['task_id'],
   execution_date=msg['execution_date'],
   try_number=msg['try_number'])
   msg['message'] = msg['message'].strip()
   msg['offset'] = 100
   
   To do the same, the elastic search ingest processor pipeline looks like the 
following for me:
   
   
   
   "description" : "cluster json log Pipeline",
   "processors" : [
 {
   "rename" : {
 "field" : "message",
 "target_field" : "raw_message"
   }
 },
 {
   "json" : {
 "field" : "raw_message",
 "add_to_root" : false,
 "target_field" : "json_target"
   }
 },
 {
   "grok" : {
 "field" : "json_target.message",
 "patterns" : [
   "Job %{DATA:job_id}: Subtask %{DATA} %{GREEDYDATA:json_msg}",
   "%{GREEDYDATA}"
 ]
   }
 },
 {
   "json" : {
 "field" : "json_msg",
 "add_to_root" : true,
 "if" : "ctx.job_id != null"
   }
 },
 {
   "json" : {
 "field" : "raw_message",
 "add_to_root" : true,
 "if" : "ctx.job_id == null"
   }
 },
 {
   "remove" : {
 "field" : "json_msg",
 "ignore_missing" : true
   }
 },
 {
   "remove" : {
 "field" : "json_target"
   }
 },
 {
   "set" : {
 "field" : "event.kind",
 "value" : "tasks",
 "if" : "ctx.message != null"
   }
 },
 {
   "set" : {
 "field" : "event.dataset",
 "value" : "airflow",
 "if" : "ctx.dag_id != null && ctx.task_id != null"
   }
 },
 {
   "set" : {
 "field" : "log_id",
 "value" : 
"{{dag_id}}-{{task_id}}-{{execution_date}}-{{try_number}}",
 "if" : "ctx.event?.dataset == 'airflow'"
   }
 },
 {
   "set" : {
 "field" : "offset",
 "value" : "{{log.offset}}",
 "if" : "ctx.event?.dataset == 'airflow'"
   }
 }
   ],
   "on_failure" : [
 {
   "set" : {
 "field" : "error.message",
 "value" : "{{ _ingest.on_failure_message }}"
   }
 }
   ]
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

2020-01-16 Thread GitBox
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add 
log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r367792823
 
 

 ##
 File path: airflow/utils/log/es_task_handler.py
 ##
 @@ -255,7 +256,9 @@ def close(self):
 
 # Mark the end of file using end of log mark,
 # so we know where to stop while auto-tailing.
-self.handler.stream.write(self.end_of_log_mark)
+if self.write_stdout:
+print()
+self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   with regarding to request to split this change into two PRs, can you please 
check my test case test_close_with_log_id. I need to use the separate index 
parameter  to safely test the log_id logic correctly as I showed above, and 
ensure the end-of-log mark is removed correctly and the # of messages are 
expected etc. It is needed in this PR to improve testability of the code. Can 
you please check if this is reasonable?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

2020-01-16 Thread GitBox
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add 
log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r367788223
 
 

 ##
 File path: airflow/utils/log/es_task_handler.py
 ##
 @@ -255,7 +256,9 @@ def close(self):
 
 # Mark the end of file using end of log mark,
 # so we know where to stop while auto-tailing.
-self.handler.stream.write(self.end_of_log_mark)
+if self.write_stdout:
+print()
+self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   Kevin, the removal of end-of-log mark is already handled by Andrii's initial 
changes for the stdout support so it won't be displayed to the users. Pleas see 
https://github.com/apache/airflow/commit/0da976a0e1e28e2c0cd274d7384cf2976db6deec#diff-485751b55125e8a90050d22f69e8467c
   
   
   # end_of_log_mark may contain characters like '\n' which is needed to
   # have the log uploaded but will not be stored in elasticsearch.
   metadata['end_of_log'] = False if not logs \
   else logs[-1].message == self.end_of_log_mark.strip()
   
   then
   
   # If we hit the end of the log, remove the actual end_of_log message
   # to prevent it from showing in the UI.
   i = len(logs) if not metadata['end_of_log'] else len(logs) - 1
   message = '\n'.join([log.message for log in logs[0:i]])
   
   Please see my test case test_close_with_log_id that exercises this logic in 
the tests now.
   Can you please check if this is clear to you now?
   
   Log_id is constructed on the Elasticsearch but it needs the dag_id, task_id, 
execution_date and try_number to compute the log_id and that is why  you need 
to use emit() to include the information. In my test case, here is how I 
simulate the logic in the elastic search processors:
   msg['log_id'] = self.log_id_template.format(
   dag_id=msg['dag_id'],
   task_id=msg['task_id'],
   execution_date=msg['execution_date'],
   try_number=msg['try_number'])
   msg['message'] = msg['message'].strip()
   msg['offset'] = 100
   
   To do the same, the elastic search ingest processor pipeline looks like the 
following for me:
   
   
   
   `{
   "description" : "cluster json log Pipeline",
   "processors" : [
 {
   "rename" : {
 "field" : "message",
 "target_field" : "json_msg"
   }
 },
 {
   "json" : {
 "field" : "json_msg",
 "add_to_root" : true
   }
 },
 {
   "rename" : {
 "field" : "message",
 "target_field" : "outter_msg"
   }
 },  
 {
   "grok" : {
 "field" : "outter_msg",
 "patterns" : [
   "%{DATA} {%{DATA}, \"message\": \"%{DATA:message}\", 
%{GREEDYDATA}}",
   "%{GREEDYDATA}"
 ]
   }  
 },
 {
   "set" : {
"field" : "event.kind",
 "value" : "tasks",
 "if" : "ctx.message != null"  
   }
 },
 {
   "rename" : {
 "field" : "outter_msg",
 "target_field" : "message",
 "if" : "ctx.message == null"
   }
 },  
 {
   "remove" : {
 "field" : "outter_msg",
 "ignore_missing" : true
   }
 },  
 {
   "set" : {
 "field" : "event.dataset",
 "value" : "airflow",
 "if" : "ctx.dag_id != null && ctx.task_id != null"
   }
 },
{
   "set" : {
 "field" : "log_id",
 "value" : 
"{{dag_id}}-{{task_id}}-{{execution_date}}-{{try_number}}",
 "if" : "ctx.event?.dataset == 'airflow'"
   }
 },
{
   "set" : {
 "field" : "offset",
 "value" : "{{log.offset}}",
 "if" : "ctx.event?.dataset == 'airflow'"
   }
 }   
   ],
   "on_failure" : [
 {
   "set" : {
 "field" : "error.message",
 "value" : "{{ _ingest.on_failure_message }}"
   }
 }
   ]
 }
   `
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

2020-01-16 Thread GitBox
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add 
log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r367788223
 
 

 ##
 File path: airflow/utils/log/es_task_handler.py
 ##
 @@ -255,7 +256,9 @@ def close(self):
 
 # Mark the end of file using end of log mark,
 # so we know where to stop while auto-tailing.
-self.handler.stream.write(self.end_of_log_mark)
+if self.write_stdout:
+print()
+self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   Kevin, the removal of end-of-log mark is already handled by Andrii's initial 
changes for the stdout support so it won't be displayed to the users. Pleas see 
https://github.com/apache/airflow/commit/0da976a0e1e28e2c0cd274d7384cf2976db6deec#diff-485751b55125e8a90050d22f69e8467c
   
   
   # end_of_log_mark may contain characters like '\n' which is needed to
   # have the log uploaded but will not be stored in elasticsearch.
   metadata['end_of_log'] = False if not logs \
   else logs[-1].message == self.end_of_log_mark.strip()
   
   then
   
   # If we hit the end of the log, remove the actual end_of_log message
   # to prevent it from showing in the UI.
   i = len(logs) if not metadata['end_of_log'] else len(logs) - 1
   message = '\n'.join([log.message for log in logs[0:i]])
   
   Please see my test case test_close_with_log_id that exercises this logic in 
the tests now.
   Can you please check if this is clear to you now?
   
   Logid is constructed on the Elasticsearch but it needs the dag_id, task_id, 
execution_date and try_number to compute the log_id and that is why  you need 
to use emit() to include the information. In my test case, here is how I 
simulate the logic in the elastic search processors:
   msg['log_id'] = self.log_id_template.format(
   dag_id=msg['dag_id'],
   task_id=msg['task_id'],
   execution_date=msg['execution_date'],
   try_number=msg['try_number'])
   msg['message'] = msg['message'].strip()
   msg['offset'] = 100
   
   To do the same, the elastic search ingest processor pipeline looks like the 
following for me:
   {
   "description" : "cluster json log Pipeline",
   "processors" : [
 {
   "rename" : {
 "field" : "message",
 "target_field" : "json_msg"
   }
 },
 {
   "json" : {
 "field" : "json_msg",
 "add_to_root" : true
   }
 },
 {
   "rename" : {
 "field" : "message",
 "target_field" : "outter_msg"
   }
 },  
 {
   "grok" : {
 "field" : "outter_msg",
 "patterns" : [
   "%{DATA} {%{DATA}, \"message\": \"%{DATA:message}\", 
%{GREEDYDATA}}",
   "%{GREEDYDATA}"
 ]
   }  
 },
 {
   "set" : {
"field" : "event.kind",
 "value" : "tasks",
 "if" : "ctx.message != null"  
   }
 },
 {
   "rename" : {
 "field" : "outter_msg",
 "target_field" : "message",
 "if" : "ctx.message == null"
   }
 },  
 {
   "remove" : {
 "field" : "outter_msg",
 "ignore_missing" : true
   }
 },  
 {
   "set" : {
 "field" : "event.dataset",
 "value" : "airflow",
 "if" : "ctx.dag_id != null && ctx.task_id != null"
   }
 },
{
   "set" : {
 "field" : "log_id",
 "value" : 
"{{dag_id}}-{{task_id}}-{{execution_date}}-{{try_number}}",
 "if" : "ctx.event?.dataset == 'airflow'"
   }
 },
{
   "set" : {
 "field" : "offset",
 "value" : "{{log.offset}}",
 "if" : "ctx.event?.dataset == 'airflow'"
   }
 }   
   ],
   "on_failure" : [
 {
   "set" : {
 "field" : "error.message",
 "value" : "{{ _ingest.on_failure_message }}"
   }
 }
   ]
 }
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

2020-01-16 Thread GitBox
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add 
log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r367788223
 
 

 ##
 File path: airflow/utils/log/es_task_handler.py
 ##
 @@ -255,7 +256,9 @@ def close(self):
 
 # Mark the end of file using end of log mark,
 # so we know where to stop while auto-tailing.
-self.handler.stream.write(self.end_of_log_mark)
+if self.write_stdout:
+print()
+self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   Kevin, the removal of end-of-log mark is already handled by Andrii's initial 
changes for the stdout support: 
https://github.com/apache/airflow/commit/0da976a0e1e28e2c0cd274d7384cf2976db6deec#diff-485751b55125e8a90050d22f69e8467c
   
   
   # end_of_log_mark may contain characters like '\n' which is needed to
   # have the log uploaded but will not be stored in elasticsearch.
   metadata['end_of_log'] = False if not logs \
   else logs[-1].message == self.end_of_log_mark.strip()
   
   then
   
   # If we hit the end of the log, remove the actual end_of_log message
   # to prevent it from showing in the UI.
   i = len(logs) if not metadata['end_of_log'] else len(logs) - 1
   message = '\n'.join([log.message for log in logs[0:i]])
   
   Please see my test case test_close_with_log_id that exercises this logic in 
the tests now.
   Can you please check if this is clear to you now?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

2020-01-16 Thread GitBox
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add 
log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r367788223
 
 

 ##
 File path: airflow/utils/log/es_task_handler.py
 ##
 @@ -255,7 +256,9 @@ def close(self):
 
 # Mark the end of file using end of log mark,
 # so we know where to stop while auto-tailing.
-self.handler.stream.write(self.end_of_log_mark)
+if self.write_stdout:
+print()
+self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   Kevin, the removal of end-of-log mark is already handled by Andrii's initial 
changes for the stdout support: 
https://github.com/apache/airflow/commit/0da976a0e1e28e2c0cd274d7384cf2976db6deec#diff-485751b55125e8a90050d22f69e8467c
   
   
   # end_of_log_mark may contain characters like '\n' which is needed to
   # have the log uploaded but will not be stored in elasticsearch.
   metadata['end_of_log'] = False if not logs \
   else logs[-1].message == self.end_of_log_mark.strip()
   
   then
   # If we hit the end of the log, remove the actual end_of_log message
   # to prevent it from showing in the UI.
   i = len(logs) if not metadata['end_of_log'] else len(logs) - 1
   message = '\n'.join([log.message for log in logs[0:i]])
   
   Please see my test case test_close_with_log_id that exercises this logic in 
the tests now.
   Can you please check if this is clear to you now?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

2020-01-16 Thread GitBox
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add 
log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r36777
 
 

 ##
 File path: airflow/utils/log/es_task_handler.py
 ##
 @@ -255,7 +256,9 @@ def close(self):
 
 # Mark the end of file using end of log mark,
 # so we know where to stop while auto-tailing.
-self.handler.stream.write(self.end_of_log_mark)
+if self.write_stdout:
+print()
+self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   I did observe that occasionally the end-of-log mark is not on its separate 
line in the console, then it can confuse elasticsearch. So I always add a 
print() in this PR right before the end-of-log print out and this is just a 
reliability fix given we need the end-of-log mark in a separate log line for 
this to work.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

2020-01-16 Thread GitBox
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add 
log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r367788223
 
 

 ##
 File path: airflow/utils/log/es_task_handler.py
 ##
 @@ -255,7 +256,9 @@ def close(self):
 
 # Mark the end of file using end of log mark,
 # so we know where to stop while auto-tailing.
-self.handler.stream.write(self.end_of_log_mark)
+if self.write_stdout:
+print()
+self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   Kevin, the removal of end-of-log mark is already handled by Andrii's initial 
changes adding the stdout support: 
https://github.com/apache/airflow/commit/0da976a0e1e28e2c0cd274d7384cf2976db6deec#diff-485751b55125e8a90050d22f69e8467c
   
   # end_of_log_mark may contain characters like '\n' which is needed to
   # have the log uploaded but will not be stored in elasticsearch.
   metadata['end_of_log'] = False if not logs \
   else logs[-1].message == self.end_of_log_mark.strip()
   
   then
   # If we hit the end of the log, remove the actual end_of_log message
   # to prevent it from showing in the UI.
   i = len(logs) if not metadata['end_of_log'] else len(logs) - 1
   message = '\n'.join([log.message for log in logs[0:i]])
   
   Please see my test case test_close_with_log_id that exercises this logic in 
the tests now.
   Can you please check if this is clear to you now?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] potiuk commented on a change in pull request #7193: [AIRFLOW-XXXX] Adds branching strategy to documentation

2020-01-16 Thread GitBox
potiuk commented on a change in pull request #7193: [AIRFLOW-] Adds 
branching strategy to documentation
URL: https://github.com/apache/airflow/pull/7193#discussion_r367783481
 
 

 ##
 File path: CONTRIBUTING.rst
 ##
 @@ -152,6 +152,17 @@ these guidelines:
 -   Adhere to guidelines for commit messages described in this `article 
`__.
 This makes the lives of those who come after you a lot easier.
 
+Airflow Git Branches
+
+
+All new development in Airflow happens in ``master`` branch. All PRs should 
target that branch.
+We also have ``v1-10-test`` branch which is used to test ``1.10.x`` series of 
Airflow and where committers
+(and only committers) cherry-pick selected commits from the master branch. The 
``v1-10-test`` branch might be
 
 Review comment:
   I think it's ok as it is now wit PR template - the template is shown AFTER 
you make PR already  and master is default branch - you'd have to make a 
deliberate effort to change the PR target branch, and by that time you probably 
read about branching strategy :).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] potiuk commented on a change in pull request #7193: [AIRFLOW-XXXX] Adds branching strategy to documentation

2020-01-16 Thread GitBox
potiuk commented on a change in pull request #7193: [AIRFLOW-] Adds 
branching strategy to documentation
URL: https://github.com/apache/airflow/pull/7193#discussion_r367783100
 
 

 ##
 File path: CONTRIBUTING.rst
 ##
 @@ -152,6 +152,17 @@ these guidelines:
 -   Adhere to guidelines for commit messages described in this `article 
`__.
 This makes the lives of those who come after you a lot easier.
 
+Airflow Git Branches
+
+
+All new development in Airflow happens in ``master`` branch. All PRs should 
target that branch.
+We also have ``v1-10-test`` branch which is used to test ``1.10.x`` series of 
Airflow and where committers
+(and only committers) cherry-pick selected commits from the master branch. The 
``v1-10-test`` branch might be
 
 Review comment:
   Good point. I reworded it slightly and added -x


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] zhongjiajie commented on a change in pull request #7193: [AIRFLOW-XXXX] Adds branching strategy to documentation

2020-01-16 Thread GitBox
zhongjiajie commented on a change in pull request #7193: [AIRFLOW-] Adds 
branching strategy to documentation
URL: https://github.com/apache/airflow/pull/7193#discussion_r367751853
 
 

 ##
 File path: CONTRIBUTING.rst
 ##
 @@ -152,6 +152,17 @@ these guidelines:
 -   Adhere to guidelines for commit messages described in this `article 
`__.
 This makes the lives of those who come after you a lot easier.
 
+Airflow Git Branches
+
+
+All new development in Airflow happens in ``master`` branch. All PRs should 
target that branch.
+We also have ``v1-10-test`` branch which is used to test ``1.10.x`` series of 
Airflow and where committers
+(and only committers) cherry-pick selected commits from the master branch. The 
``v1-10-test`` branch might be
 
 Review comment:
   Should we add hint only use `cherry-pick -x` from master? And should we also 
add this in 
https://github.com/apache/airflow/blob/master/.github/PULL_REQUEST_TEMPLATE.md 
check list?  For more direct hint to contributor


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] codecov-io edited a comment on issue #7178: [AIRFLOW-6572] Move AWS classes to providers.amazon.aws package

2020-01-16 Thread GitBox
codecov-io edited a comment on issue #7178: [AIRFLOW-6572] Move AWS classes to 
providers.amazon.aws package
URL: https://github.com/apache/airflow/pull/7178#issuecomment-574766638
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7178?src=pr=h1) 
Report
   > Merging 
[#7178](https://codecov.io/gh/apache/airflow/pull/7178?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/50efda5c69c1ddfaa869b408540182fb19f1a286?src=pr=desc)
 will **decrease** coverage by `1.04%`.
   > The diff coverage is `93.61%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/7178/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/7178?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#7178  +/-   ##
   ==
   - Coverage   85.37%   84.32%   -1.05% 
   ==
 Files 723  753  +30 
 Lines   3955839685 +127 
   ==
   - Hits3377133466 -305 
   - Misses   5787 6219 +432
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/7178?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[...rflow/contrib/sensors/sagemaker\_training\_sensor.py](https://codecov.io/gh/apache/airflow/pull/7178/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL3NlbnNvcnMvc2FnZW1ha2VyX3RyYWluaW5nX3NlbnNvci5weQ==)
 | `0% <0%> (-100%)` | :arrow_down: |
   | 
[airflow/contrib/operators/s3\_list\_operator.py](https://codecov.io/gh/apache/airflow/pull/7178/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9zM19saXN0X29wZXJhdG9yLnB5)
 | `100% <100%> (ø)` | :arrow_up: |
   | 
[.../contrib/operators/emr\_create\_job\_flow\_operator.py](https://codecov.io/gh/apache/airflow/pull/7178/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9lbXJfY3JlYXRlX2pvYl9mbG93X29wZXJhdG9yLnB5)
 | `100% <100%> (+8%)` | :arrow_up: |
   | 
[...ntrib/operators/emr\_terminate\_job\_flow\_operator.py](https://codecov.io/gh/apache/airflow/pull/7178/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9lbXJfdGVybWluYXRlX2pvYl9mbG93X29wZXJhdG9yLnB5)
 | `100% <100%> (+5%)` | :arrow_up: |
   | 
[.../contrib/operators/sagemaker\_transform\_operator.py](https://codecov.io/gh/apache/airflow/pull/7178/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9zYWdlbWFrZXJfdHJhbnNmb3JtX29wZXJhdG9yLnB5)
 | `100% <100%> (+8.1%)` | :arrow_up: |
   | 
[airflow/providers/amazon/aws/hooks/glue\_catalog.py](https://codecov.io/gh/apache/airflow/pull/7178/diff?src=pr=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYW1hem9uL2F3cy9ob29rcy9nbHVlX2NhdGFsb2cucHk=)
 | `100% <100%> (ø)` | |
   | 
[airflow/contrib/sensors/emr\_job\_flow\_sensor.py](https://codecov.io/gh/apache/airflow/pull/7178/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL3NlbnNvcnMvZW1yX2pvYl9mbG93X3NlbnNvci5weQ==)
 | `100% <100%> (+4.54%)` | :arrow_up: |
   | 
[...flow/contrib/sensors/sagemaker\_transform\_sensor.py](https://codecov.io/gh/apache/airflow/pull/7178/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL3NlbnNvcnMvc2FnZW1ha2VyX3RyYW5zZm9ybV9zZW5zb3IucHk=)
 | `100% <100%> (ø)` | :arrow_up: |
   | 
[...ample\_dags/example\_emr\_job\_flow\_automatic\_steps.py](https://codecov.io/gh/apache/airflow/pull/7178/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2V4YW1wbGVfZGFncy9leGFtcGxlX2Vtcl9qb2JfZmxvd19hdXRvbWF0aWNfc3RlcHMucHk=)
 | `100% <100%> (ø)` | :arrow_up: |
   | 
[...roviders/amazon/aws/sensors/sagemaker\_transform.py](https://codecov.io/gh/apache/airflow/pull/7178/diff?src=pr=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYW1hem9uL2F3cy9zZW5zb3JzL3NhZ2VtYWtlcl90cmFuc2Zvcm0ucHk=)
 | `100% <100%> (ø)` | |
   | ... and [108 
more](https://codecov.io/gh/apache/airflow/pull/7178/diff?src=pr=tree-more) 
| |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/7178?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/7178?src=pr=footer). 
Last update 
[50efda5...51410da](https://codecov.io/gh/apache/airflow/pull/7178?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] zhongjiajie commented on a change in pull request #7193: [AIRFLOW-XXXX] Adds branching strategy to documentation

2020-01-16 Thread GitBox
zhongjiajie commented on a change in pull request #7193: [AIRFLOW-] Adds 
branching strategy to documentation
URL: https://github.com/apache/airflow/pull/7193#discussion_r367751853
 
 

 ##
 File path: CONTRIBUTING.rst
 ##
 @@ -152,6 +152,17 @@ these guidelines:
 -   Adhere to guidelines for commit messages described in this `article 
`__.
 This makes the lives of those who come after you a lot easier.
 
+Airflow Git Branches
+
+
+All new development in Airflow happens in ``master`` branch. All PRs should 
target that branch.
+We also have ``v1-10-test`` branch which is used to test ``1.10.x`` series of 
Airflow and where committers
+(and only committers) cherry-pick selected commits from the master branch. The 
``v1-10-test`` branch might be
 
 Review comment:
   Should we add hint only use `cherry-pick -x` from master? And should we also 
add this in 
https://github.com/apache/airflow/blob/master/.github/PULL_REQUEST_TEMPLATE.md 
? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] zhongjiajie commented on a change in pull request #7193: [AIRFLOW-XXXX] Adds branching strategy to documentation

2020-01-16 Thread GitBox
zhongjiajie commented on a change in pull request #7193: [AIRFLOW-] Adds 
branching strategy to documentation
URL: https://github.com/apache/airflow/pull/7193#discussion_r367751853
 
 

 ##
 File path: CONTRIBUTING.rst
 ##
 @@ -152,6 +152,17 @@ these guidelines:
 -   Adhere to guidelines for commit messages described in this `article 
`__.
 This makes the lives of those who come after you a lot easier.
 
+Airflow Git Branches
+
+
+All new development in Airflow happens in ``master`` branch. All PRs should 
target that branch.
+We also have ``v1-10-test`` branch which is used to test ``1.10.x`` series of 
Airflow and where committers
+(and only committers) cherry-pick selected commits from the master branch. The 
``v1-10-test`` branch might be
 
 Review comment:
   Should we add hint only use `cherry-pick -x` from master? And should we also 
add this in 
https://github.com/apache/airflow/blob/master/.github/PULL_REQUEST_TEMPLATE.md 
check list? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] mbelang commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error

2020-01-16 Thread GitBox
mbelang commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read 
time out error
URL: https://github.com/apache/airflow/pull/6643#issuecomment-575438090
 
 
   this mitigated the problem at least :)
   
   ``` AIRFLOW__KUBERNETES__KUBE_CLIENT_REQUEST_ARGS: '{ "_request_timeout": 
"50" }'```
   
   What is the default timeout currently?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-6583) (BigQuery) Add query_params to templated_fields

2020-01-16 Thread Jithin Sukumar (Jira)
Jithin Sukumar created AIRFLOW-6583:
---

 Summary: (BigQuery) Add query_params to templated_fields
 Key: AIRFLOW-6583
 URL: https://issues.apache.org/jira/browse/AIRFLOW-6583
 Project: Apache Airflow
  Issue Type: New Feature
  Components: gcp
Affects Versions: 1.10.7
Reporter: Jithin Sukumar
Assignee: Jithin Sukumar


To query time-partitioned tables, I am passing \{{query_params}} like this
yesterday = Variable.get('yesterday', '\{{yesterday_ds}}')
today = Variable.get('today', '\{{ds}}')
...
query_params=[\{'name': 'yesterday', 'parameterType': {'type': 'STRING'},
   'parameterValue': \{'value': yesterday}},
  \{'name': 'today', 'parameterType': {'type': 'STRING'},
   'parameterValue': \{'value': today}}]
query_params needs to be a template_field



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read time out error

2020-01-16 Thread GitBox
maxirus commented on issue #6643: [AIRFLOW-6040] Fix KubernetesJobWatcher Read 
time out error
URL: https://github.com/apache/airflow/pull/6643#issuecomment-575436903
 
 
   @mbelang No. Between the holidays & work I have not had time. Hoping to have 
some time this weekend.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] rconroy293 commented on a change in pull request #7119: [AIRFLOW-5840] Add operator extra link to external task sensor

2020-01-16 Thread GitBox
rconroy293 commented on a change in pull request #7119: [AIRFLOW-5840] Add 
operator extra link to external task sensor
URL: https://github.com/apache/airflow/pull/7119#discussion_r367720047
 
 

 ##
 File path: airflow/sensors/external_task_sensor.py
 ##
 @@ -16,22 +16,71 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-
 import datetime
 import os
-from typing import Optional, Union
+from typing import FrozenSet, Optional, Union
 
 from sqlalchemy import func
 
+from airflow.configuration import conf
 from airflow.exceptions import AirflowException
-from airflow.models import DagBag, DagModel, DagRun, TaskInstance
+from airflow.models import BaseOperatorLink, DagBag, DagModel, DagRun, 
TaskInstance
 from airflow.operators.dummy_operator import DummyOperator
 from airflow.sensors.base_sensor_operator import BaseSensorOperator
 from airflow.utils.decorators import apply_defaults
 from airflow.utils.session import provide_session
 from airflow.utils.state import State
 
 
+def get_possible_target_execution_dates(execution_date, execution_delta, 
execution_date_fn):
+"""
+Gets the execution date(s) of an external DAG for which an
+ExternalTaskSensor should succeed on. Default is the execution
+date itself, but it may be modified if a non-null execution delta
+or execution date function is passed in.
+
+:param execution_date: The execution date of the sensor
+:type execution_date: datetime.datetime
+:param execution_delta: Time difference between the sensor
+execution date and the target DAG run execution date. Positive
+delta looks back in time.
+:type execution_delta: Optional[datetime.timedelta]
+:param execution_date_fn: Function to compute the execution date(s)
+of the target DAG run to look at given the sensor's execution
+date.
+:type execution_date_fn: Optional[Callable]
+:return: Execution date(s) to wait for
+:rtype: List[datetime.datetime]
+"""
+if execution_delta:
+dttm = execution_date - execution_delta
+elif execution_date_fn:
+dttm = execution_date_fn(execution_date)
+else:
+dttm = execution_date
+
+return dttm if isinstance(dttm, list) else [dttm]
+
+
+class ExternalTaskLink(BaseOperatorLink):
+name = 'External DAG'
+
+def get_link(self, operator, dttm):
+possible_execution_dates = get_possible_target_execution_dates(
+execution_date=dttm,
+execution_delta=getattr(operator, 'execution_delta', None),
+execution_date_fn=None,
 
 Review comment:
   @ashb I like the idea of using XCom. How would we be able to do an 
`xcom_pull` in the `get_link` method? It looks like we only have access to the 
serialized operator and the execution date, but not a task instance


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow-site] aijamalnk opened a new issue #235: Add updated "Roadmap" for the project

2020-01-16 Thread GitBox
aijamalnk opened a new issue #235: Add updated "Roadmap" for the project
URL: https://github.com/apache/airflow-site/issues/235
 
 
   Currently the website leads to the cwiki page 
https://cwiki.apache.org/confluence/display/AIRFLOW/Roadmap that seems to be 
outdated. 
   
   It is important for users to know whether the project is well maintained 
before adoptions, and have clear visibility into changes that are coming to the 
project.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow-site] aijamalnk opened a new issue #234: Add a “Blog snippet” to the Homepage

2020-01-16 Thread GitBox
aijamalnk opened a new issue #234: Add a “Blog snippet” to the Homepage
URL: https://github.com/apache/airflow-site/issues/234
 
 
   Add a “Blog snippet” to the Homepage, like in https://beam.apache.org/ but 
prettier. It can look like Announcements (meetup & launch of cloud data fusion) 
in https://cdap.io/ homepage


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow-site] aijamalnk opened a new issue #233: Add “Resources” section in the Airflow website

2020-01-16 Thread GitBox
aijamalnk opened a new issue #233: Add “Resources” section in the Airflow 
website
URL: https://github.com/apache/airflow-site/issues/233
 
 
   Add “Resources” section in the community page after “Join the community” and 
before “Primary Members Commiters”. Also add Menu to the “Community” page that 
shows what is on the page to the users at a glance. Here is a good example 
https://beam.apache.org/community/presentation-materials/
   
   - This place will have logos, and other promo materials 
https://cwiki.apache.org/confluence/display/AIRFLOW/File+lists
   - This place will have the Airflow book 
https://www.manning.com/books/data-pipelines-with-apache-airflow


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (AIRFLOW-3788) Add a landing page for Apache Airflow

2020-01-16 Thread Aizhamal Nurmamat kyzy (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aizhamal Nurmamat kyzy resolved AIRFLOW-3788.
-
Resolution: Fixed

New Airflow site: [https://airflow.apache.org/]

> Add a landing page for Apache Airflow
> -
>
> Key: AIRFLOW-3788
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3788
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: documentation, project-management
>Reporter: Aizhamal Nurmamat kyzy
>Assignee: Aizhamal Nurmamat kyzy
>Priority: Minor
>
> Adding a landing page for Airflow website with description of the product, 
> core feature + navigation to documentation, community and blog pages, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3607) Decreasing scheduler delay between tasks

2020-01-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017549#comment-17017549
 ] 

ASF subversion and git services commented on AIRFLOW-3607:
--

Commit 50efda5c69c1ddfaa869b408540182fb19f1a286 in airflow's branch 
refs/heads/master from amichai07
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=50efda5 ]

[AIRFLOW-3607] Only query DB once per DAG run for TriggerRuleDep (#4751)

This decreases scheduler delay between tasks by about 20% for larger DAGs,
sometimes more for larger or more complex DAGs.

The delay between tasks can be a major issue, especially when we have dags with 
many subdags, figures out that the scheduling process spends plenty of time in
dependency checking, we took the trigger rule dependency which calls the db for
each task instance, we made it call the db just once for each dag_run

> Decreasing scheduler delay between tasks
> 
>
> Key: AIRFLOW-3607
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3607
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.10.0, 1.10.1, 1.10.2
> Environment: ubuntu 14.04
>Reporter: Amichai Horvitz
>Assignee: Amichai Horvitz
>Priority: Major
> Fix For: 1.10.8
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> I came across the TODO in airflow/ti_deps/deps/trigger_rule_dep (line 52) 
> that says instead of checking the query for every task let the tasks report 
> to the dagrun. I have a dag with many tasks and the delay between tasks can 
> rise to 10 seconds or more, I already changed the configuration, added 
> processes and memory, checked the code and did research, profiling and other 
> experiments. I hope that this change will make a drastic change in the delay. 
> I would be happy to discuss this solution, the research and other solutions 
> for this issue.  
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-3607) Decreasing scheduler delay between tasks

2020-01-16 Thread Ash Berlin-Taylor (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-3607.

Fix Version/s: 1.10.8
   Resolution: Fixed

> Decreasing scheduler delay between tasks
> 
>
> Key: AIRFLOW-3607
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3607
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.10.0, 1.10.1, 1.10.2
> Environment: ubuntu 14.04
>Reporter: Amichai Horvitz
>Assignee: Amichai Horvitz
>Priority: Major
> Fix For: 1.10.8
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> I came across the TODO in airflow/ti_deps/deps/trigger_rule_dep (line 52) 
> that says instead of checking the query for every task let the tasks report 
> to the dagrun. I have a dag with many tasks and the delay between tasks can 
> rise to 10 seconds or more, I already changed the configuration, added 
> processes and memory, checked the code and did research, profiling and other 
> experiments. I hope that this change will make a drastic change in the delay. 
> I would be happy to discuss this solution, the research and other solutions 
> for this issue.  
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] ashb commented on issue #4751: [AIRFLOW-3607] collected trigger rule dep check per dag run

2020-01-16 Thread GitBox
ashb commented on issue #4751: [AIRFLOW-3607] collected trigger rule dep check 
per dag run
URL: https://github.com/apache/airflow/pull/4751#issuecomment-575384842
 
 
   @amichai07 We got there in the end!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb merged pull request #4751: [AIRFLOW-3607] collected trigger rule dep check per dag run

2020-01-16 Thread GitBox
ashb merged pull request #4751: [AIRFLOW-3607] collected trigger rule dep check 
per dag run
URL: https://github.com/apache/airflow/pull/4751
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3607) Decreasing scheduler delay between tasks

2020-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017548#comment-17017548
 ] 

ASF GitHub Bot commented on AIRFLOW-3607:
-

ashb commented on pull request #4751: [AIRFLOW-3607] collected trigger rule dep 
check per dag run
URL: https://github.com/apache/airflow/pull/4751
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Decreasing scheduler delay between tasks
> 
>
> Key: AIRFLOW-3607
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3607
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.10.0, 1.10.1, 1.10.2
> Environment: ubuntu 14.04
>Reporter: Amichai Horvitz
>Assignee: Amichai Horvitz
>Priority: Major
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> I came across the TODO in airflow/ti_deps/deps/trigger_rule_dep (line 52) 
> that says instead of checking the query for every task let the tasks report 
> to the dagrun. I have a dag with many tasks and the delay between tasks can 
> rise to 10 seconds or more, I already changed the configuration, added 
> processes and memory, checked the code and did research, profiling and other 
> experiments. I hope that this change will make a drastic change in the delay. 
> I would be happy to discuss this solution, the research and other solutions 
> for this issue.  
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] boring-cyborg[bot] commented on issue #4751: [AIRFLOW-3607] collected trigger rule dep check per dag run

2020-01-16 Thread GitBox
boring-cyborg[bot] commented on issue #4751: [AIRFLOW-3607] collected trigger 
rule dep check per dag run
URL: https://github.com/apache/airflow/pull/4751#issuecomment-575384786
 
 
   Awesome work, congrats on your first merged pull request!
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] KevinYang21 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

2020-01-16 Thread GitBox
KevinYang21 commented on a change in pull request #7141: [AIRFLOW-6544] add 
log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r367675716
 
 

 ##
 File path: airflow/utils/log/es_task_handler.py
 ##
 @@ -255,7 +256,9 @@ def close(self):
 
 # Mark the end of file using end of log mark,
 # so we know where to stop while auto-tailing.
-self.handler.stream.write(self.end_of_log_mark)
+if self.write_stdout:
+print()
+self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   It means the last line would be something like `[2020-01-16 07:58:32,712] 
{es_task_handler.py:XXX} INFO [end_of_log_mark]` and thus made the reader 
unable to understand it.
   
   I'm a bit lost how did this removed the log id from the end_of_log_mark. 
Isn't the log_id we constructed in this file only for log fetching? My 
understanding is that the log_id is determined when we upload the log, e.g. 
when we pipe stdout to logstash or when we upload file through filebeat to 
logstash.
   
   Maybe I was understanding this wrong and there is indeed a bug. In that case 
I would agree on spliting this change into two PRs for sanity purpose.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-6556) Improving unclear and incomplete documentation

2020-01-16 Thread Jarek Potiuk (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017504#comment-17017504
 ] 

Jarek Potiuk commented on AIRFLOW-6556:
---

[~uncletoxa] (y)

> Improving unclear and incomplete documentation
> --
>
> Key: AIRFLOW-6556
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6556
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: master
>Reporter: Jacob Ward
>Assignee: Jarek Potiuk
>Priority: Trivial
>
> To help improve documentation it was discussed in the mailing list that users 
> of Airflow should have somewhere to report missing, incomplete or unclear 
> documentation. Any users who find this should comment on this ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] potiuk commented on issue #4846: [AIRFLOW-4030] adding start to singularity for airflow

2020-01-16 Thread GitBox
potiuk commented on issue #4846: [AIRFLOW-4030] adding start to singularity for 
airflow
URL: https://github.com/apache/airflow/pull/4846#issuecomment-575354949
 
 
   One more thing: https://issues.apache.org/jira/browse/AIRFLOW-6556 - in this 
issue we started (yesterday) gather information from the users on what could be 
improved in our docs. Feel free to add your comment there. I will be catalysing 
an effort to improve the docs in the next month and I would like to have people 
who can tell what's wrong (in their opinion) with our docs and can later help 
us to verify if the documentation is improved.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] potiuk edited a comment on issue #4846: [AIRFLOW-4030] adding start to singularity for airflow

2020-01-16 Thread GitBox
potiuk edited a comment on issue #4846: [AIRFLOW-4030] adding start to 
singularity for airflow
URL: https://github.com/apache/airflow/pull/4846#issuecomment-575337231
 
 
   Just to make sure we understand each other - I am not defensive, I really 
try to understand and put in actions some of the things that can make our 
community more welcoming. I understand your experience is not good with this 
PR, so I really want to find out and get some ideas on what we can do better 
(or maybe give you some information you missed that we already changed). As you 
will see below - we are really open to that and I particularly am very 
interested in that.
   
   > Also understandable, and in this case, as a new contributor, it would be 
helpful to have someone give guidance about what needs to be done, tested, etc.
   
   I perfectly understand that part. Actually you might want to take a look at 
the thread I started recently titled "Being even more welcoming community?" at 
our devlist:
   
https://lists.apache.org/thread.html/aa6006ae051a406b0dfe20297407efd2ddf1337293b33fdda4c8f5fd%40%3Cdev.airflow.apache.org%3E
   
   We specifically talked there about need for some mentorship approach where 
more experienced community members could mentor the new ones and answer the 
questions. We also have slack where people ask questions (and other people - 
often committers are responding) and devlist in general. 
   
   So what do you think in general we could do better here? What would help you 
as a new committer?
   
   I would really encourage you to join the devlist thread about "Being more 
welcoming") and voice your concerns there. I really want to understand what 
your expectations are for that other than what is currently there. Any ideas 
are welcome.
   
   > 30-40 times seems a big excessive, but I don't want to judge. 
   
   Well. Few years ago I've learned (from a person I value a lot) one of the 
best quotes:
   "If there is something that is painful - start doing it more often". It is a 
counter-intuitive but brilliant advice in many things in IT - doing builds, 
releasing etc. This is very same thing with rebases - the more often you do 
them, the less painful they are and the better you learn how to do them in 
least painful way. That's why for me 30-40 is not a lot because I sometimes do 
it daily And it works beautifully because at most I have one-two-conflicts to 
solve that are easy/obvious.
   
   > It seems like there should be better organization around at least setting 
expectations for contribution. In my case, I neither knew what to do, I didn't 
feel empowered to do anything, and I didn't understand the architecture well 
enough or have enough experience with the community to know what I was supposed 
to do.
   
   I have a feeling that you are referring to Airflow as it was 10 months ago 
when you started it. And gee how we've changed since. Let me just ask you a few 
questions:
   - Do you know we have updated "Pull Request guidelines" which describe the 
requirements for contribution in fairly short but comprehensive way? 
https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines
 ?
   
   - Do you know that we have a detailed "new contributor's" workflow in our 
CONTRIBUTING.rst documentation where we graphically describe how the process 
look like, where we explain that you should involve and discuss with community, 
be persistent: 
https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example
 ?
   
   - Do you know that we have "First time contributor's workshops" - 
https://cwiki.apache.org/confluence/display/AIRFLOW/First+time+contributor%27s+workshop
 where we encourage people to work together with commiters and learn how to 
contribute, what are the requirements and they have a chance to do so. We've 
already run 3 of those and we are discussing 5 more. 
   
   - Do you know that we have a "Breeze" development environment that aims to 
be up-and-running in less than 10 minutes and allows you to quickly run any of 
the tests without a hassle of configure your local environment? 
https://github.com/apache/airflow/blob/master/BREEZE.rst
   
   - Do you know that we have a detailed "Testing" document that describes how 
to test Airflow - including integration tests, Kubernetes tests, 
backend-specific tests:  
https://github.com/apache/airflow/blob/master/TESTING.rst
   
   I would really love to understand it - whether you simply had your knowledge 
about our guides/processes back from 10 months ago, or maybe you knew all those 
resources and they were not helpful? And ideally - if you know what exactly 
could help you - I would love if you write it in the devlist thread:
   
https://lists.apache.org/thread.html/aa6006ae051a406b0dfe20297407efd2ddf1337293b33fdda4c8f5fd%40%3Cdev.airflow.apache.org%3E
   
   > Yes, and to this point I'd say touche - there are people on the other side 
that aren't bitter, but need guidance. I've 

[jira] [Commented] (AIRFLOW-6556) Improving unclear and incomplete documentation

2020-01-16 Thread Anton Zayniev (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017488#comment-17017488
 ] 

Anton Zayniev commented on AIRFLOW-6556:


*Mailing list:*

I think we need some onboarding for ppl not familiar with that kind of 
interaction:
 * devlist etiquette
 * hint to ease lists usage (like mail filters/rules, pony mail, etc.)
 * why do devlist preferred over slack, gitter, etc.

 

> Improving unclear and incomplete documentation
> --
>
> Key: AIRFLOW-6556
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6556
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: master
>Reporter: Jacob Ward
>Assignee: Jarek Potiuk
>Priority: Trivial
>
> To help improve documentation it was discussed in the mailing list that users 
> of Airflow should have somewhere to report missing, incomplete or unclear 
> documentation. Any users who find this should comment on this ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] potiuk opened a new pull request #7193: [AIRFLOW-XXXX] Adds branching strategy to documentation

2020-01-16 Thread GitBox
potiuk opened a new pull request #7193: [AIRFLOW-] Adds branching strategy 
to documentation
URL: https://github.com/apache/airflow/pull/7193
 
 
   ---
   Issue link: WILL BE INSERTED BY 
[boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Commit message/PR title starts with `[AIRFLOW-]`. AIRFLOW- = 
JIRA ID*
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Commits follow "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow 
Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   * For document-only changes commit message can start with 
`[AIRFLOW-]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)
 for more information.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Comment Edited] (AIRFLOW-4470) RBAC Github Enterprise OAuth provider callback URL?

2020-01-16 Thread Cooper Gillan (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017469#comment-17017469
 ] 

Cooper Gillan edited comment on AIRFLOW-4470 at 1/16/20 8:48 PM:
-

Another important note: we did need to override the {{AirflowSecurityManager}} 
{{add_user}} method to ensure that unique email addresses were generated. We 
put the following into {{webserver_config.py}}:

{code:python}
class MySecurityManager(AirflowSecurityManager):
"""Override add_user function to ensure unique email addresses."""

def add_user(
self, username, first_name, last_name, email, role, password="", 
hashed_password=""
):
"""Generic function to create user."""
return super().add_user(
username,
first_name,
last_name,
f"{username}@example.com",
role,
password,
hashed_password,
)


SECURITY_MANAGER_CLASS = MySecurityManager
{code}

As far as we could tell there is a bug here in {{airflow}} where a unique 
username/email are required for {{ab_user}} despite GHE only returning the 
username.


was (Author: coopergillan):
Another important note: we did need to override the {{AirflowSecurityManager}} 
{{add_user}} method to ensure that unique email addresses were generated. We 
put the following into {{webserver_config.py}}:

{code:python}
class MySecurityManager(AirflowSecurityManager):
"""Override add_user function to ensure unique email addresses."""

def add_user(
self, username, first_name, last_name, email, role, password="", 
hashed_password=""
):
"""Generic function to create user."""
return super().add_user(
username,
first_name,
last_name,
f"{username}@example.com",
role,
password,
hashed_password,
)


SECURITY_MANAGER_CLASS = MySecurityManager
{code}

> RBAC Github Enterprise OAuth provider callback URL?
> ---
>
> Key: AIRFLOW-4470
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4470
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication, webserver
>Affects Versions: 1.10.2
>Reporter: Geez
>Priority: Blocker
>  Labels: usability
> Attachments: airflow_ss0_2.PNG, airflow_sso3.PNG, airflow_sso4.PNG, 
> image-2019-10-30-16-25-14-436.png, image-2019-10-31-11-47-04-041.png
>
>
> Hi all,
> Quick question, when using RBAC with OAuth providers (1.10.2):
>  * we are not specifying the {{authenticate}} or {{auth_backend}} in the 
> [webserver] section of \{{airflow.cfg}}anymore
>  * Instead, we set the OAuth provider config in the flask-appbuilder's 
> {{webserver_config.py}}:
> {code:java}
>  
> # Adapting Google OAuth example to Github:
> OAUTH_PROVIDERS = [
> {'name':'github', 'icon':'fa-github', 'token_key':'access_token',
>  'remote_app': {
> 'base_url':'https://github.corporate-domain.com/login',
> 
> 'access_token_url':'https://github.corporate-domain.com/login/oauth/access_token',
> 
> 'authorize_url':'https://github.corporate-domain.com/login/oauth/authorize',
> 'request_token_url': None,
> 'consumer_key': '',
> 'consumer_secret': 'X',
>  }
> }
> ]
>  
> {code}
>  _Question:_
>  * so what callback URL do we specify in the app? 
> {{http:/webapp/ghe_oauth/callback}} would not work right? (example with 
> github entreprise)
> No matter what I specify for the callback url (/ghe_oauth/callback or 
> [http://webapp.com|http://webapp.com/]), I get an error message about 
> {{redirect_uri}} mismatch:
> {code:java}
> {{error=redirect_uri_mismatch_description=The+redirect_uri+MUST+match+the+registered+callback+URL+for+this+application
>  }}{code}
> _Docs ref:_
>  Here is how you setup OAuth with Github Entreprise on Airflow _*without*_ 
> RBAC: 
> [https://airflow.apache.org/security.html#github-enterprise-ghe-authentication]
> And here is how you setup OAuth via the {{webserver_config.py}} of 
> flask_appbuilder used by airflow _*with*_RBAC:
>  
> [https://flask-appbuilder.readthedocs.io/en/latest/security.html#authentication-oauth]
> What's the *callback url* when using RBAC and OAuth with Airflow?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4470) RBAC Github Enterprise OAuth provider callback URL?

2020-01-16 Thread Cooper Gillan (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017469#comment-17017469
 ] 

Cooper Gillan commented on AIRFLOW-4470:


Another important note: we did need to override the {{AirflowSecurityManager}} 
{{add_user}} method to ensure that unique email addresses were generated. We 
put the following into {{webserver_config.py}}:

{code:python}
class MySecurityManager(AirflowSecurityManager):
"""Override add_user function to ensure unique email addresses."""

def add_user(
self, username, first_name, last_name, email, role, password="", 
hashed_password=""
):
"""Generic function to create user."""
return super().add_user(
username,
first_name,
last_name,
f"{username}@example.com",
role,
password,
hashed_password,
)


SECURITY_MANAGER_CLASS = MySecurityManager
{code}

> RBAC Github Enterprise OAuth provider callback URL?
> ---
>
> Key: AIRFLOW-4470
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4470
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication, webserver
>Affects Versions: 1.10.2
>Reporter: Geez
>Priority: Blocker
>  Labels: usability
> Attachments: airflow_ss0_2.PNG, airflow_sso3.PNG, airflow_sso4.PNG, 
> image-2019-10-30-16-25-14-436.png, image-2019-10-31-11-47-04-041.png
>
>
> Hi all,
> Quick question, when using RBAC with OAuth providers (1.10.2):
>  * we are not specifying the {{authenticate}} or {{auth_backend}} in the 
> [webserver] section of \{{airflow.cfg}}anymore
>  * Instead, we set the OAuth provider config in the flask-appbuilder's 
> {{webserver_config.py}}:
> {code:java}
>  
> # Adapting Google OAuth example to Github:
> OAUTH_PROVIDERS = [
> {'name':'github', 'icon':'fa-github', 'token_key':'access_token',
>  'remote_app': {
> 'base_url':'https://github.corporate-domain.com/login',
> 
> 'access_token_url':'https://github.corporate-domain.com/login/oauth/access_token',
> 
> 'authorize_url':'https://github.corporate-domain.com/login/oauth/authorize',
> 'request_token_url': None,
> 'consumer_key': '',
> 'consumer_secret': 'X',
>  }
> }
> ]
>  
> {code}
>  _Question:_
>  * so what callback URL do we specify in the app? 
> {{http:/webapp/ghe_oauth/callback}} would not work right? (example with 
> github entreprise)
> No matter what I specify for the callback url (/ghe_oauth/callback or 
> [http://webapp.com|http://webapp.com/]), I get an error message about 
> {{redirect_uri}} mismatch:
> {code:java}
> {{error=redirect_uri_mismatch_description=The+redirect_uri+MUST+match+the+registered+callback+URL+for+this+application
>  }}{code}
> _Docs ref:_
>  Here is how you setup OAuth with Github Entreprise on Airflow _*without*_ 
> RBAC: 
> [https://airflow.apache.org/security.html#github-enterprise-ghe-authentication]
> And here is how you setup OAuth via the {{webserver_config.py}} of 
> flask_appbuilder used by airflow _*with*_RBAC:
>  
> [https://flask-appbuilder.readthedocs.io/en/latest/security.html#authentication-oauth]
> What's the *callback url* when using RBAC and OAuth with Airflow?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] potiuk edited a comment on issue #4846: [AIRFLOW-4030] adding start to singularity for airflow

2020-01-16 Thread GitBox
potiuk edited a comment on issue #4846: [AIRFLOW-4030] adding start to 
singularity for airflow
URL: https://github.com/apache/airflow/pull/4846#issuecomment-575337231
 
 
   Just to make sure we understand each other - I am not defensive, I really 
try to understand and put in actions some of the things that can make our 
community more welcoming. I understand your experience is not good with this 
PR, so I really want to find out and get some ideas on what we can do better 
(or maybe give you some information you missed that we already changed). As you 
will see below - we are really open to that and I particularly am very 
interested in that.
   
   > Also understandable, and in this case, as a new contributor, it would be 
helpful to have someone give guidance about what needs to be done, tested, etc.
   
   I perfectly understand that part. Actually you might want to take a look at 
the thread I started recently titled "Being even more welcoming community?" at 
our devlist:
   
https://lists.apache.org/thread.html/aa6006ae051a406b0dfe20297407efd2ddf1337293b33fdda4c8f5fd%40%3Cdev.airflow.apache.org%3E
   
   We specifically talked there about need for some mentorship approach where 
more experienced community members could mentor the new ones and answer the 
questions. We also have slack where people ask questions (and other people - 
often committers are responding) and devlist in general. 
   
   So what do you think in general we could do better here? What would help you 
as a new committer?
   
   I would really encourage you to join the devlist thread about "Being more 
welcoming") and voice your concerns there. I really want to understand what 
your expectations are for that other than what is currently there. Any ideas 
are welcome.
   
   > 30-40 times seems a big excessive, but I don't want to judge. 
   
   Well. Few years ago I've learned (from a person I value a lot) one of the 
best quotes:
   "If there is something that is painful - start doing it more often". It is a 
counter-intuitive but brilliant advice in many things in IT - doing builds, 
releasing etc. This is very same thing with rebases - the more often you do 
them, the less painful they are and the better you learn how to do them in 
least painful way. That's why for me 30-40 is not a lot because I sometimes do 
it daily And it works beautifully because at most I have one-two-conflicts to 
solve that are easy/obvious.
   
   > It seems like there should be better organization around at least setting 
expectations for contribution. In my case, I neither knew what to do, I didn't 
feel empowered to do anything, and I didn't understand the architecture well 
enough or have enough experience with the community to know what I was supposed 
to do.
   
   I have a feeling that you are referring to Airflow as it was 10 months ago 
when you started it. And gee how we've changed since. Let me just ask you a few 
questions:
   - Do you know we have updated "Pull Request guidelines" which describe the 
requirements for contribution in fairly short but comprehensive way? 
https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines
 ?
   
   - Do you know that we have a detailed "new contributor's" workflow in our 
CONTRIBUTING.rst documentation where we graphically describe how the process 
look like, where we explain that you should involve and discuss with community, 
be persistent: 
https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example
 ?
   
   - Do you know that we have "First time contributor's workshops" - 
https://cwiki.apache.org/confluence/display/AIRFLOW/First+time+contributor%27s+workshop
 where we encourage people to work together with commiters and learn how to 
contribute, what are the requirements and they have a chance to do so. We've 
already run 3 of those and we are discussing 5 more. 
   
   - Do you know that we have a "Breeze" development environment that aims to 
be up-and-running in less than 10 minutes and allows you to quickly run any of 
the tests without a hassle of configure your local environment? 
https://github.com/apache/airflow/blob/master/BREEZE.rst
   
   - Do you know that we have a detailed "Testing" document that describes how 
to test Airflow - including integration tests, Kubernetes tests, 
backend-specific tests:  
https://github.com/apache/airflow/blob/master/TESTING.rst
   
   I would really love to understand it - whether you simply had your knowledge 
about our guides/processes back from 10 months ago, or maybe you knew all those 
resources and they were not helpful? And ideally - if you know what exactly 
could help you - I would love if you write it in the devlist thread:
   
https://lists.apache.org/thread.html/aa6006ae051a406b0dfe20297407efd2ddf1337293b33fdda4c8f5fd%40%3Cdev.airflow.apache.org%3E
   
   > Yes, and to this point I'd say touche - there are people on the other side 
that aren't bitter, but need guidance. I've 

[GitHub] [airflow] potiuk commented on issue #4846: [AIRFLOW-4030] adding start to singularity for airflow

2020-01-16 Thread GitBox
potiuk commented on issue #4846: [AIRFLOW-4030] adding start to singularity for 
airflow
URL: https://github.com/apache/airflow/pull/4846#issuecomment-575337231
 
 
   Just to make sure we understand each other - I am not defensive, I really 
try to understand and put in actions some of the things that can make our 
community more welcoming. I understand your experience is not good with this 
PR, so I really want to find out and get some ideas on what we can do better 
(or maybe give you some information you missed that we already changed). As you 
will see below - we are really open to that and I particularly am very 
interested in that.
   
   > Also understandable, and in this case, as a new contributor, it would be 
helpful to have someone give guidance about what needs to be done, tested, etc.
   
   I perfectly understand that part. Actually you might want to take a look at 
the thread I started recently titled "Being even more welcoming community?" at 
our devlist:
   
https://lists.apache.org/thread.html/aa6006ae051a406b0dfe20297407efd2ddf1337293b33fdda4c8f5fd%40%3Cdev.airflow.apache.org%3E
   
   We specifically talked there about need for some mentorship approach where 
more experienced community members could mentor the new ones and answer the 
questions. We also have slack where people ask questions (and other people - 
often committers are responding) and devlist in general. 
   
   So what do you think in general we could do better here? What would help you 
as a new committer?
   
   I would really encourage you to join the devlist thread about "Being more 
welcoming") and voice your concerns there. I really want to understand what 
your expectations are for that other than what is currently there. Any ideas 
are welcome.
   
   > 30-40 times seems a big excessive, but I don't want to judge. 
   
   Well. Few years ago I've learned (from a person I value a lot) one of the 
best quotes:
   "If there is something that is painful - start doing it more often". It is a 
counter-intuitive but brilliant advice in many things in IT - doing builds, 
releasing etc. This is very same thing with rebases - the more often you do 
them, the less painful they are and the better you learn how to do them in 
least painful way. That's why for me 30-40 is not a lot because I sometimes do 
it daily And it works beautifully because at most I have one-two-conflicts to 
solve that are easy/obvious.
   
   > It seems like there should be better organization around at least setting 
expectations for contribution. In my case, I neither knew what to do, I didn't 
feel empowered to do anything, and I didn't understand the architecture well 
enough or have enough experience with the community to know what I was supposed 
to do.
   
   I have a feeling that you are referring to Airflow as it was 10 months ago 
when you started it. And gee how we've changed since. Let me just ask you a few 
questions:
   - Do you know we have updated "Pull Request guidelines" which describe the 
requirements for contribution in fairly short but comprehensive way? 
https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines
 ?
   
   - Do you know that we have a detailed "new contributor's" workflow in our 
CONTRIBUTING.rst documentation where we graphically describe how the process 
look like, where we explain that you should involve and discuss with community, 
be persistent: 
https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example
 ?
   
   - Do you know that we have "First time contributor's workshops" - 
https://cwiki.apache.org/confluence/display/AIRFLOW/First+time+contributor%27s+workshop
 where we encourage people to work together with commiters and learn how to 
contribute, what are the requirements and they have a chance to do so. We've 
already run 3 of those and we are discussing 5 more. 
   
   - Do you know that we have a "Breeze" development environment that aims to 
be up-and-running in less than 10 minutes and allows you to quickly run any of 
the tests without a hassle of configure your local environment? 
https://github.com/apache/airflow/blob/master/BREEZE.rst
   
   - Do you know that we have a detailed "Testing" document that describes how 
to test Airflow - including integration tests, Kubernetes tests, 
backend-specific tests:  
https://github.com/apache/airflow/blob/master/TESTING.rst
   
   I would really love to understand it - whether you simply had your knowledge 
about our guides/processes back from 10 months ago, or maybe you knew all those 
resources and they were not helpful? And ideally - if you know what exactly 
could help you - I would love if you write it in the devlist thread:
   
https://lists.apache.org/thread.html/aa6006ae051a406b0dfe20297407efd2ddf1337293b33fdda4c8f5fd%40%3Cdev.airflow.apache.org%3E
   
   > Yes, and to this point I'd say touche - there are people on the other side 
that aren't bitter, but need guidance. I've been an 

[GitHub] [airflow] dstandish commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC hook & deprecation warning for pymssql

2020-01-16 Thread GitBox
dstandish commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC 
hook & deprecation warning for pymssql
URL: https://github.com/apache/airflow/pull/6850#discussion_r367632414
 
 

 ##
 File path: docs/howto/connection/odbc.rst
 ##
 @@ -0,0 +1,107 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+.. _howto/connection/odbc:
+
+ODBC Connection
+===
+
+The ``odbc`` connection type provides connection to ODBC data sources 
including MS SQL Server.
+
+Enable with ``pip install apache-airflow[odbc]``.
+
+
+System prerequisites
+
+
+This connection type uses `pyodbc `_, 
which has some system
+dependencies, as documented on the `pyodbc wiki 
`_.
+
+You must also install a driver:
+
+* `MS SQL ODBC drivers 
`_.
+
+* `Exasol ODBC drivers 
`_.
+
+
+Configuring the Connection
+--
+Host (required)
+The host to connect to.
+
+Schema (optional)
+Specify the schema name to be used in the database.
+
+Login (required)
+Specify the user name to connect.
+
+Password (required)
+Specify the password to connect.
+
+Extra (optional)
+Any key / value parameters supplied here will be added to the ODBC 
connection string.
+
+.. note::
+
+To use the hook 
:py:class:`~airflow.providers.odbc.hooks.odbc.OdbcHook` you must specify the
+driver you want to use, in the ``Connection.extra`` field or as a 
parameter at hook initialization.
+
+For example, consider the following value for ``extra``:
+
+.. code-block:: json
+
+{
+  "Driver": "ODBC Driver 17 for SQL Server",
+  "ApplicationIntent": "ReadOnly",
+  "TrustedConnection": "Yes"
+}
+
+This would produce a connection string containing these params:
+
+.. code-block::
+
+DRIVER={ODBC Driver 17 for SQL 
Server};ApplicationIntent=ReadOnly;TrustedConnection=Yes;
 
 Review comment:
   ok updated PTAL
   
   i highlighted that there are two reserved keywords "connect_kwargs" and 
"sqlalchemy_scheme" but that everything else is incorporated into odbc 
connection string
   
   and i added a note in the extra close to the examples that emphasizes you 
have to install your own driver, in addition to the instructions in "system 
prerequisites"


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] dstandish commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC hook & deprecation warning for pymssql

2020-01-16 Thread GitBox
dstandish commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC 
hook & deprecation warning for pymssql
URL: https://github.com/apache/airflow/pull/6850#discussion_r367632414
 
 

 ##
 File path: docs/howto/connection/odbc.rst
 ##
 @@ -0,0 +1,107 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+.. _howto/connection/odbc:
+
+ODBC Connection
+===
+
+The ``odbc`` connection type provides connection to ODBC data sources 
including MS SQL Server.
+
+Enable with ``pip install apache-airflow[odbc]``.
+
+
+System prerequisites
+
+
+This connection type uses `pyodbc `_, 
which has some system
+dependencies, as documented on the `pyodbc wiki 
`_.
+
+You must also install a driver:
+
+* `MS SQL ODBC drivers 
`_.
+
+* `Exasol ODBC drivers 
`_.
+
+
+Configuring the Connection
+--
+Host (required)
+The host to connect to.
+
+Schema (optional)
+Specify the schema name to be used in the database.
+
+Login (required)
+Specify the user name to connect.
+
+Password (required)
+Specify the password to connect.
+
+Extra (optional)
+Any key / value parameters supplied here will be added to the ODBC 
connection string.
+
+.. note::
+
+To use the hook 
:py:class:`~airflow.providers.odbc.hooks.odbc.OdbcHook` you must specify the
+driver you want to use, in the ``Connection.extra`` field or as a 
parameter at hook initialization.
+
+For example, consider the following value for ``extra``:
+
+.. code-block:: json
+
+{
+  "Driver": "ODBC Driver 17 for SQL Server",
+  "ApplicationIntent": "ReadOnly",
+  "TrustedConnection": "Yes"
+}
+
+This would produce a connection string containing these params:
+
+.. code-block::
+
+DRIVER={ODBC Driver 17 for SQL 
Server};ApplicationIntent=ReadOnly;TrustedConnection=Yes;
 
 Review comment:
   ok updated PTAL


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] dstandish commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC hook & deprecation warning for pymssql

2020-01-16 Thread GitBox
dstandish commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC 
hook & deprecation warning for pymssql
URL: https://github.com/apache/airflow/pull/6850#discussion_r367629138
 
 

 ##
 File path: docs/howto/connection/odbc.rst
 ##
 @@ -0,0 +1,107 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+.. _howto/connection/odbc:
+
+ODBC Connection
+===
+
+The ``odbc`` connection type provides connection to ODBC data sources 
including MS SQL Server.
+
+Enable with ``pip install apache-airflow[odbc]``.
+
+
+System prerequisites
+
+
+This connection type uses `pyodbc `_, 
which has some system
+dependencies, as documented on the `pyodbc wiki 
`_.
+
+You must also install a driver:
+
+* `MS SQL ODBC drivers 
`_.
+
+* `Exasol ODBC drivers 
`_.
+
+
+Configuring the Connection
+--
+Host (required)
+The host to connect to.
+
+Schema (optional)
+Specify the schema name to be used in the database.
+
+Login (required)
+Specify the user name to connect.
+
+Password (required)
+Specify the password to connect.
+
+Extra (optional)
+Any key / value parameters supplied here will be added to the ODBC 
connection string.
+
+.. note::
+
+To use the hook 
:py:class:`~airflow.providers.odbc.hooks.odbc.OdbcHook` you must specify the
+driver you want to use, in the ``Connection.extra`` field or as a 
parameter at hook initialization.
+
+For example, consider the following value for ``extra``:
+
+.. code-block:: json
+
+{
+  "Driver": "ODBC Driver 17 for SQL Server",
+  "ApplicationIntent": "ReadOnly",
+  "TrustedConnection": "Yes"
+}
+
+This would produce a connection string containing these params:
+
+.. code-block::
+
+DRIVER={ODBC Driver 17 for SQL 
Server};ApplicationIntent=ReadOnly;TrustedConnection=Yes;
 
 Review comment:
   i am gonna tweak the docs a bit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on a change in pull request #6870: [AIRFLOW-0578] Check return code

2020-01-16 Thread GitBox
ashb commented on a change in pull request #6870: [AIRFLOW-0578] Check return 
code
URL: https://github.com/apache/airflow/pull/6870#discussion_r367626457
 
 

 ##
 File path: airflow/utils/state.py
 ##
 @@ -122,3 +122,10 @@ def unfinished(cls):
 cls.UP_FOR_RETRY,
 cls.UP_FOR_RESCHEDULE
 ]
+
+@classmethod
+def unsuccessful(cls):
+"""
+A list of states indicating that a task completed unsuccessfully.
+"""
+return [cls.FAILED, cls.UP_FOR_RETRY, cls.UP_FOR_RESCHEDULE]
 
 Review comment:
   Past me was right, and this is all good here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] nuclearpinguin opened a new pull request #7192: [AIRFLOW-XXXX] Fix BigQuery example DAG

2020-01-16 Thread GitBox
nuclearpinguin opened a new pull request #7192: [AIRFLOW-] Fix BigQuery 
example DAG
URL: https://github.com/apache/airflow/pull/7192
 
 
   It seems that BigQuery example doesn't work properly.
   
   ---
   Issue link: WILL BE INSERTED BY 
[boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [ ] Description above provides context of the change
   - [ ] Commit message/PR title starts with `[AIRFLOW-]`. AIRFLOW- = 
JIRA ID*
   - [ ] Unit tests coverage for changes (not needed for documentation changes)
   - [ ] Commits follow "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)"
   - [ ] Relevant documentation is updated including usage instructions.
   - [ ] I will engage committers as explained in [Contribution Workflow 
Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   * For document-only changes commit message can start with 
`[AIRFLOW-]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)
 for more information.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] codecov-io edited a comment on issue #5785: [AIRFLOW-5176] Add Azure Data Explorer (Kusto) operator

2020-01-16 Thread GitBox
codecov-io edited a comment on issue #5785: [AIRFLOW-5176] Add Azure Data 
Explorer (Kusto) operator
URL: https://github.com/apache/airflow/pull/5785#issuecomment-553413704
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/5785?src=pr=h1) 
Report
   > Merging 
[#5785](https://codecov.io/gh/apache/airflow/pull/5785?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/4a10fd1230868a5d27ce42005ba42e1a3e22d439?src=pr=desc)
 will **decrease** coverage by `0.09%`.
   > The diff coverage is `88.2%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/5785/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5785?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ##   master#5785 +/-   ##
   =
   - Coverage   84.23%   84.13%   -0.1% 
   =
 Files 682  725 +43 
 Lines   3845439610   +1156 
   =
   + Hits3239033327+937 
   - Misses   6064 6283+219
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/5785?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/macros/hive.py](https://codecov.io/gh/apache/airflow/pull/5785/diff?src=pr=tree#diff-YWlyZmxvdy9tYWNyb3MvaGl2ZS5weQ==)
 | `38.7% <ø> (ø)` | :arrow_up: |
   | 
[airflow/operators/gcs\_to\_bq.py](https://codecov.io/gh/apache/airflow/pull/5785/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvZ2NzX3RvX2JxLnB5)
 | `70.58% <ø> (ø)` | :arrow_up: |
   | 
[airflow/operators/cassandra\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/5785/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvY2Fzc2FuZHJhX3RvX2djcy5weQ==)
 | `64.91% <ø> (ø)` | :arrow_up: |
   | 
[...flow/contrib/example\_dags/example\_qubole\_sensor.py](https://codecov.io/gh/apache/airflow/pull/5785/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2V4YW1wbGVfZGFncy9leGFtcGxlX3F1Ym9sZV9zZW5zb3IucHk=)
 | `100% <ø> (ø)` | :arrow_up: |
   | 
[airflow/contrib/hooks/qubole\_hook.py](https://codecov.io/gh/apache/airflow/pull/5785/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2hvb2tzL3F1Ym9sZV9ob29rLnB5)
 | `52.67% <ø> (ø)` | :arrow_up: |
   | 
[airflow/hooks/hive\_hooks.py](https://codecov.io/gh/apache/airflow/pull/5785/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9oaXZlX2hvb2tzLnB5)
 | `100% <ø> (+22.39%)` | :arrow_up: |
   | 
[airflow/gcp/operators/dataflow.py](https://codecov.io/gh/apache/airflow/pull/5785/diff?src=pr=tree#diff-YWlyZmxvdy9nY3Avb3BlcmF0b3JzL2RhdGFmbG93LnB5)
 | `99.07% <ø> (ø)` | :arrow_up: |
   | 
[...ontrib/example\_dags/example\_kubernetes\_operator.py](https://codecov.io/gh/apache/airflow/pull/5785/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2V4YW1wbGVfZGFncy9leGFtcGxlX2t1YmVybmV0ZXNfb3BlcmF0b3IucHk=)
 | `78.57% <ø> (ø)` | :arrow_up: |
   | 
[airflow/models/variable.py](https://codecov.io/gh/apache/airflow/pull/5785/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdmFyaWFibGUucHk=)
 | `93.42% <ø> (ø)` | :arrow_up: |
   | 
[airflow/hooks/filesystem.py](https://codecov.io/gh/apache/airflow/pull/5785/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9maWxlc3lzdGVtLnB5)
 | `90% <ø> (ø)` | |
   | ... and [417 
more](https://codecov.io/gh/apache/airflow/pull/5785/diff?src=pr=tree-more) 
| |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/5785?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/5785?src=pr=footer). 
Last update 
[4a10fd1...bc5d1e6](https://codecov.io/gh/apache/airflow/pull/5785?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] amichai07 commented on a change in pull request #4751: [AIRFLOW-3607] collected trigger rule dep check per dag run

2020-01-16 Thread GitBox
amichai07 commented on a change in pull request #4751: [AIRFLOW-3607] collected 
trigger rule dep check per dag run
URL: https://github.com/apache/airflow/pull/4751#discussion_r367599292
 
 

 ##
 File path: tests/models/test_dagrun.py
 ##
 @@ -234,8 +234,9 @@ def test_dagrun_deadlock(self):
 ti_op2.set_state(state=State.NONE, session=session)
 
 dr.update_state()
-self.assertEqual(dr.state, State.RUNNING)
+self.assertEqual(dr.state, State.FAILED)
 
+dr.set_state(State.RUNNING)
 
 Review comment:
   The only thing I wondered about was whether it should be "upstream_failed" 
instead of "skipped" 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] amichai07 commented on issue #4751: [AIRFLOW-3607] collected trigger rule dep check per dag run

2020-01-16 Thread GitBox
amichai07 commented on issue #4751: [AIRFLOW-3607] collected trigger rule dep 
check per dag run
URL: https://github.com/apache/airflow/pull/4751#issuecomment-575299248
 
 
   > That change looks like it fixed it, so I'll hold off on merging my PR
   
   Thanks, that would be great!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ktmud commented on a change in pull request #7088: [WIP][AIRFLOW-6494] SSH host_proxy has to be fresh

2020-01-16 Thread GitBox
ktmud commented on a change in pull request #7088: [WIP][AIRFLOW-6494] SSH 
host_proxy has to be fresh
URL: https://github.com/apache/airflow/pull/7088#discussion_r367582852
 
 

 ##
 File path: airflow/contrib/hooks/ssh_hook.py
 ##
 @@ -171,6 +171,13 @@ def get_conn(self) -> paramiko.SSHClient:
  'against Man-In-The-Middle attacks')
 # Default is RejectPolicy
 client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
+
+# restart host_proxy to avoid "Broken Pipe" error
+if getattr(self, 'host_proxy') is not None:
+if not self.host_proxy.closed:
+self.host_proxy.close()
 
 Review comment:
   HostProxy (Paramiko's ProxyCommand) is basically a subprocess. Closing of 
proxy socket does not mean subprocess is closed. This is for making sure 
subprocess is also closed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] codecov-io edited a comment on issue #7187: [AIRFLOW-6576] fix scheduler crash caused by deleted task with sla misses

2020-01-16 Thread GitBox
codecov-io edited a comment on issue #7187: [AIRFLOW-6576] fix scheduler crash 
caused by deleted task with sla misses
URL: https://github.com/apache/airflow/pull/7187#issuecomment-574935439
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7187?src=pr=h1) 
Report
   > Merging 
[#7187](https://codecov.io/gh/apache/airflow/pull/7187?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/59c8a826b8d5d365db68e800cea3de59256530c9?src=pr=desc)
 will **increase** coverage by `0.18%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/7187/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/7187?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#7187  +/-   ##
   ==
   + Coverage   84.91%   85.09%   +0.18% 
   ==
 Files 723  723  
 Lines   3954639555   +9 
   ==
   + Hits3358133660  +79 
   + Misses   5965 5895  -70
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/7187?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/jobs/scheduler\_job.py](https://codecov.io/gh/apache/airflow/pull/7187/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL3NjaGVkdWxlcl9qb2IucHk=)
 | `89.45% <100%> (+0.69%)` | :arrow_up: |
   | 
[airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7187/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==)
 | `44.44% <0%> (-55.56%)` | :arrow_down: |
   | 
[airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7187/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==)
 | `52.94% <0%> (-47.06%)` | :arrow_down: |
   | 
[airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7187/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==)
 | `45.25% <0%> (-46.72%)` | :arrow_down: |
   | 
[airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7187/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5)
 | `50.98% <0%> (-23.53%)` | :arrow_down: |
   | 
[...rflow/contrib/operators/kubernetes\_pod\_operator.py](https://codecov.io/gh/apache/airflow/pull/7187/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZF9vcGVyYXRvci5weQ==)
 | `76.47% <0%> (-22.33%)` | :arrow_down: |
   | 
[airflow/models/connection.py](https://codecov.io/gh/apache/airflow/pull/7187/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvY29ubmVjdGlvbi5weQ==)
 | `68.78% <0%> (+0.97%)` | :arrow_up: |
   | 
[airflow/jobs/backfill\_job.py](https://codecov.io/gh/apache/airflow/pull/7187/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL2JhY2tmaWxsX2pvYi5weQ==)
 | `91.88% <0%> (+1.44%)` | :arrow_up: |
   | 
[airflow/providers/apache/hive/hooks/hive.py](https://codecov.io/gh/apache/airflow/pull/7187/diff?src=pr=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYXBhY2hlL2hpdmUvaG9va3MvaGl2ZS5weQ==)
 | `77.55% <0%> (+1.53%)` | :arrow_up: |
   | ... and [9 
more](https://codecov.io/gh/apache/airflow/pull/7187/diff?src=pr=tree-more) 
| |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/7187?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/7187?src=pr=footer). 
Last update 
[59c8a82...23dc8c1](https://codecov.io/gh/apache/airflow/pull/7187?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] codecov-io edited a comment on issue #7187: [AIRFLOW-6576] fix scheduler crash caused by deleted task with sla misses

2020-01-16 Thread GitBox
codecov-io edited a comment on issue #7187: [AIRFLOW-6576] fix scheduler crash 
caused by deleted task with sla misses
URL: https://github.com/apache/airflow/pull/7187#issuecomment-574935439
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7187?src=pr=h1) 
Report
   > Merging 
[#7187](https://codecov.io/gh/apache/airflow/pull/7187?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/59c8a826b8d5d365db68e800cea3de59256530c9?src=pr=desc)
 will **increase** coverage by `0.18%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/7187/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/7187?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#7187  +/-   ##
   ==
   + Coverage   84.91%   85.09%   +0.18% 
   ==
 Files 723  723  
 Lines   3954639555   +9 
   ==
   + Hits3358133660  +79 
   + Misses   5965 5895  -70
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/7187?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/jobs/scheduler\_job.py](https://codecov.io/gh/apache/airflow/pull/7187/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL3NjaGVkdWxlcl9qb2IucHk=)
 | `89.45% <100%> (+0.69%)` | :arrow_up: |
   | 
[airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7187/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==)
 | `44.44% <0%> (-55.56%)` | :arrow_down: |
   | 
[airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7187/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==)
 | `52.94% <0%> (-47.06%)` | :arrow_down: |
   | 
[airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7187/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==)
 | `45.25% <0%> (-46.72%)` | :arrow_down: |
   | 
[airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7187/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5)
 | `50.98% <0%> (-23.53%)` | :arrow_down: |
   | 
[...rflow/contrib/operators/kubernetes\_pod\_operator.py](https://codecov.io/gh/apache/airflow/pull/7187/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZF9vcGVyYXRvci5weQ==)
 | `76.47% <0%> (-22.33%)` | :arrow_down: |
   | 
[airflow/models/connection.py](https://codecov.io/gh/apache/airflow/pull/7187/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvY29ubmVjdGlvbi5weQ==)
 | `68.78% <0%> (+0.97%)` | :arrow_up: |
   | 
[airflow/jobs/backfill\_job.py](https://codecov.io/gh/apache/airflow/pull/7187/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL2JhY2tmaWxsX2pvYi5weQ==)
 | `91.88% <0%> (+1.44%)` | :arrow_up: |
   | 
[airflow/providers/apache/hive/hooks/hive.py](https://codecov.io/gh/apache/airflow/pull/7187/diff?src=pr=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYXBhY2hlL2hpdmUvaG9va3MvaGl2ZS5weQ==)
 | `77.55% <0%> (+1.53%)` | :arrow_up: |
   | ... and [9 
more](https://codecov.io/gh/apache/airflow/pull/7187/diff?src=pr=tree-more) 
| |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/7187?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/7187?src=pr=footer). 
Last update 
[59c8a82...23dc8c1](https://codecov.io/gh/apache/airflow/pull/7187?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ktmud commented on issue #7088: [WIP][AIRFLOW-6494] SSH host_proxy has to be fresh

2020-01-16 Thread GitBox
ktmud commented on issue #7088: [WIP][AIRFLOW-6494] SSH host_proxy has to be 
fresh
URL: https://github.com/apache/airflow/pull/7088#issuecomment-575285316
 
 
   > Would it be too much to ask for a test covering this scenario of refresh 
(mocking Paramiko)?
   
   Working on it!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] lucafuji commented on issue #6870: [AIRFLOW-0578] Check return code

2020-01-16 Thread GitBox
lucafuji commented on issue #6870: [AIRFLOW-0578] Check return code
URL: https://github.com/apache/airflow/pull/6870#issuecomment-575284775
 
 
   @ashb ^^


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] codecov-io edited a comment on issue #4751: [AIRFLOW-3607] collected trigger rule dep check per dag run

2020-01-16 Thread GitBox
codecov-io edited a comment on issue #4751: [AIRFLOW-3607] collected trigger 
rule dep check per dag run
URL: https://github.com/apache/airflow/pull/4751#issuecomment-466029246
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/4751?src=pr=h1) 
Report
   > Merging 
[#4751](https://codecov.io/gh/apache/airflow/pull/4751?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/95087af14091f28a83ced8ff1860b86dfd93f93d?src=pr=desc)
 will **increase** coverage by `0.27%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/4751/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/4751?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#4751  +/-   ##
   ==
   + Coverage   84.81%   85.08%   +0.27% 
   ==
 Files 679  723  +44 
 Lines   3849339558+1065 
   ==
   + Hits3264633658+1012 
   - Misses   5847 5900  +53
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/4751?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/jobs/scheduler\_job.py](https://codecov.io/gh/apache/airflow/pull/4751/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL3NjaGVkdWxlcl9qb2IucHk=)
 | `89.23% <100%> (-0.05%)` | :arrow_down: |
   | 
[airflow/ti\_deps/deps/trigger\_rule\_dep.py](https://codecov.io/gh/apache/airflow/pull/4751/diff?src=pr=tree#diff-YWlyZmxvdy90aV9kZXBzL2RlcHMvdHJpZ2dlcl9ydWxlX2RlcC5weQ==)
 | `91.25% <100%> (+0.46%)` | :arrow_up: |
   | 
[airflow/ti\_deps/dep\_context.py](https://codecov.io/gh/apache/airflow/pull/4751/diff?src=pr=tree#diff-YWlyZmxvdy90aV9kZXBzL2RlcF9jb250ZXh0LnB5)
 | `100% <100%> (ø)` | :arrow_up: |
   | 
[airflow/models/dagrun.py](https://codecov.io/gh/apache/airflow/pull/4751/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFncnVuLnB5)
 | `96.36% <100%> (-0.23%)` | :arrow_down: |
   | 
[airflow/contrib/hooks/azure\_data\_lake\_hook.py](https://codecov.io/gh/apache/airflow/pull/4751/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2hvb2tzL2F6dXJlX2RhdGFfbGFrZV9ob29rLnB5)
 | `0% <0%> (-93.11%)` | :arrow_down: |
   | 
[airflow/contrib/sensors/azure\_cosmos\_sensor.py](https://codecov.io/gh/apache/airflow/pull/4751/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL3NlbnNvcnMvYXp1cmVfY29zbW9zX3NlbnNvci5weQ==)
 | `0% <0%> (-81.25%)` | :arrow_down: |
   | 
[airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/4751/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==)
 | `44.44% <0%> (-55.56%)` | :arrow_down: |
   | 
[airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/4751/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==)
 | `52.94% <0%> (-47.06%)` | :arrow_down: |
   | 
[airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/4751/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==)
 | `45.25% <0%> (-46.72%)` | :arrow_down: |
   | 
[airflow/security/kerberos.py](https://codecov.io/gh/apache/airflow/pull/4751/diff?src=pr=tree#diff-YWlyZmxvdy9zZWN1cml0eS9rZXJiZXJvcy5weQ==)
 | `30.43% <0%> (-45.66%)` | :arrow_down: |
   | ... and [240 
more](https://codecov.io/gh/apache/airflow/pull/4751/diff?src=pr=tree-more) 
| |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/4751?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/4751?src=pr=footer). 
Last update 
[95087af...5890619](https://codecov.io/gh/apache/airflow/pull/4751?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-6582) Dag_stats endpoint doesn't filter correctly

2020-01-16 Thread Robin Edwards (Jira)
Robin Edwards created AIRFLOW-6582:
--

 Summary: Dag_stats endpoint doesn't filter correctly
 Key: AIRFLOW-6582
 URL: https://issues.apache.org/jira/browse/AIRFLOW-6582
 Project: Apache Airflow
  Issue Type: Bug
  Components: ui, webserver
Affects Versions: 1.10.7, 2.0.0, master
Reporter: Robin Edwards
Assignee: Robin Edwards


Apologies my previous PR to restrict dags returned from the dag_stats end point 
via  a get parameter applied the filter after the group by which had no effect. 
So even if dag_ids was past all dags were still returned.

Forthcoming PR fixes the issue by applying the filter before the group by



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] dstandish commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC hook & deprecation warning for pymssql

2020-01-16 Thread GitBox
dstandish commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC 
hook & deprecation warning for pymssql
URL: https://github.com/apache/airflow/pull/6850#discussion_r367560908
 
 

 ##
 File path: docs/howto/connection/odbc.rst
 ##
 @@ -0,0 +1,107 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+.. _howto/connection/odbc:
+
+ODBC Connection
+===
+
+The ``odbc`` connection type provides connection to ODBC data sources 
including MS SQL Server.
+
+Enable with ``pip install apache-airflow[odbc]``.
+
+
+System prerequisites
+
+
+This connection type uses `pyodbc `_, 
which has some system
+dependencies, as documented on the `pyodbc wiki 
`_.
+
+You must also install a driver:
+
+* `MS SQL ODBC drivers 
`_.
+
+* `Exasol ODBC drivers 
`_.
+
+
+Configuring the Connection
+--
+Host (required)
+The host to connect to.
+
+Schema (optional)
+Specify the schema name to be used in the database.
+
+Login (required)
+Specify the user name to connect.
+
+Password (required)
+Specify the password to connect.
+
+Extra (optional)
+Any key / value parameters supplied here will be added to the ODBC 
connection string.
+
+.. note::
+
+To use the hook 
:py:class:`~airflow.providers.odbc.hooks.odbc.OdbcHook` you must specify the
+driver you want to use, in the ``Connection.extra`` field or as a 
parameter at hook initialization.
+
+For example, consider the following value for ``extra``:
+
+.. code-block:: json
+
+{
+  "Driver": "ODBC Driver 17 for SQL Server",
+  "ApplicationIntent": "ReadOnly",
+  "TrustedConnection": "Yes"
+}
+
+This would produce a connection string containing these params:
+
+.. code-block::
+
+DRIVER={ODBC Driver 17 for SQL 
Server};ApplicationIntent=ReadOnly;TrustedConnection=Yes;
 
 Review comment:
   There is system pre-requisites section
   
   
https://github.com/apache/airflow/pull/6850/files/34acc19fe7a006970e41d1868ccfac3aa743030e#diff-b7df04f972d4eed60ebf523a5c9879d8R28
   
   does this address your concern?  


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] dstandish commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC hook & deprecation warning for pymssql

2020-01-16 Thread GitBox
dstandish commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC 
hook & deprecation warning for pymssql
URL: https://github.com/apache/airflow/pull/6850#discussion_r367559741
 
 

 ##
 File path: airflow/providers/odbc/hooks/odbc.py
 ##
 @@ -0,0 +1,220 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+This module contains ODBC hook.
+"""
+from typing import Optional
+from urllib.parse import quote_plus
+
+import pyodbc
+
+from airflow.hooks.dbapi_hook import DbApiHook
+from airflow.utils.helpers import merge_dicts
+
+
+class OdbcHook(DbApiHook):
+"""
+Interact with odbc data sources using pyodbc.
+
+See :ref:`howto/connection/odbc` for full documentation.
+"""
+
+DEFAULT_SQLALCHEMY_SCHEME = 'mssql+pyodbc'
+conn_name_attr = 'odbc_conn_id'
+default_conn_name = 'odbc_default'
+supports_autocommit = True
+
+def __init__(
+self,
+*args,
+database: Optional[str] = None,
+driver: Optional[str] = None,
+dsn: Optional[str] = None,
+connect_kwargs: Optional[dict] = None,
+sqlalchemy_scheme: Optional[str] = None,
+**kwargs,
+):
+"""
+:param args: passed to DbApiHook
+:param database: database to use -- overrides connection ``schema``
+:param driver: name of driver or path to driver. overrides driver 
supplied in connection ``extra``
+:param dsn: name of DSN to use.  overrides DSN supplied in connection 
``extra``
+:param connect_kwargs: keyword arguments passed to ``pyodbc.connect``
+:param sqlalchemy_scheme: Scheme sqlalchemy connection.  Default is 
``mssql+pyodbc`` Only used for
+  ``get_sqlalchemy_engine`` and ``get_sqlalchemy_connection`` methods.
+:param kwargs: passed to DbApiHook
+"""
+super().__init__(*args, **kwargs)
+self._database = database
+self._driver = driver
+self._dsn = dsn
+self._conn_str = None
+self._sqlalchemy_scheme = sqlalchemy_scheme
+self._connection = None
+self._connect_kwargs = connect_kwargs
+
+@property
+def connection(self):
+"""
+``airflow.Connection`` object with connection id ``odbc_conn_id``
+"""
+if not self._connection:
+self._connection = self.get_connection(getattr(self, 
self.conn_name_attr))
+return self._connection
+
+@property
+def database(self):
+"""
+Database provided in init if exists; otherwise, ``schema`` from 
``Connection`` object.
+"""
+return self._database or self.connection.schema
+
+@property
+def sqlalchemy_scheme(self):
+"""
+Database provided in init if exists; otherwise, ``schema`` from 
``Connection`` object.
+"""
+return (
+self._sqlalchemy_scheme or
+self.connection_extra_lower.get('sqlalchemy_scheme') or
+self.DEFAULT_SQLALCHEMY_SCHEME
+)
+
+@property
+def connection_extra_lower(self):
+"""
+``connection.extra_dejson`` but where keys are converted to lower case.
+
+This is used internally for case-insensitive access of odbc params.
+"""
+return {k.lower(): v for k, v in self.connection.extra_dejson.items()}
+
+@property
+def driver(self):
+"""
+Driver from init param if given; else try to find one in connection 
extra.
+"""
+if not self._driver:
+driver = self.connection_extra_lower.get('driver')
+if driver:
+self._driver = driver
+return self._driver and 
self._driver.strip().lstrip('{').rstrip('}').strip()
 
 Review comment:
   this way your conn extra can look like this:
   ```
   {
 "Driver": "ODBC Driver 17 for SQL Server",
 "ApplicationIntent": "ReadOnly",
 "TrustedConnection": "Yes"
   }
   ```
   (i.e. no braces)
   
   If you build your conn_uri from env vars, this is especially helpful because 
in airflow conn_uri format you have to urlencode curly braces so with braces it 
would look like this:
   ```
   
'none://?Driver=%7BODBC+Driver+17+for+SQL+Server%7D=ReadOnly=Yes'
   ```
   
   

[GitHub] [airflow] dstandish commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC hook & deprecation warning for pymssql

2020-01-16 Thread GitBox
dstandish commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC 
hook & deprecation warning for pymssql
URL: https://github.com/apache/airflow/pull/6850#discussion_r367559741
 
 

 ##
 File path: airflow/providers/odbc/hooks/odbc.py
 ##
 @@ -0,0 +1,220 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+This module contains ODBC hook.
+"""
+from typing import Optional
+from urllib.parse import quote_plus
+
+import pyodbc
+
+from airflow.hooks.dbapi_hook import DbApiHook
+from airflow.utils.helpers import merge_dicts
+
+
+class OdbcHook(DbApiHook):
+"""
+Interact with odbc data sources using pyodbc.
+
+See :ref:`howto/connection/odbc` for full documentation.
+"""
+
+DEFAULT_SQLALCHEMY_SCHEME = 'mssql+pyodbc'
+conn_name_attr = 'odbc_conn_id'
+default_conn_name = 'odbc_default'
+supports_autocommit = True
+
+def __init__(
+self,
+*args,
+database: Optional[str] = None,
+driver: Optional[str] = None,
+dsn: Optional[str] = None,
+connect_kwargs: Optional[dict] = None,
+sqlalchemy_scheme: Optional[str] = None,
+**kwargs,
+):
+"""
+:param args: passed to DbApiHook
+:param database: database to use -- overrides connection ``schema``
+:param driver: name of driver or path to driver. overrides driver 
supplied in connection ``extra``
+:param dsn: name of DSN to use.  overrides DSN supplied in connection 
``extra``
+:param connect_kwargs: keyword arguments passed to ``pyodbc.connect``
+:param sqlalchemy_scheme: Scheme sqlalchemy connection.  Default is 
``mssql+pyodbc`` Only used for
+  ``get_sqlalchemy_engine`` and ``get_sqlalchemy_connection`` methods.
+:param kwargs: passed to DbApiHook
+"""
+super().__init__(*args, **kwargs)
+self._database = database
+self._driver = driver
+self._dsn = dsn
+self._conn_str = None
+self._sqlalchemy_scheme = sqlalchemy_scheme
+self._connection = None
+self._connect_kwargs = connect_kwargs
+
+@property
+def connection(self):
+"""
+``airflow.Connection`` object with connection id ``odbc_conn_id``
+"""
+if not self._connection:
+self._connection = self.get_connection(getattr(self, 
self.conn_name_attr))
+return self._connection
+
+@property
+def database(self):
+"""
+Database provided in init if exists; otherwise, ``schema`` from 
``Connection`` object.
+"""
+return self._database or self.connection.schema
+
+@property
+def sqlalchemy_scheme(self):
+"""
+Database provided in init if exists; otherwise, ``schema`` from 
``Connection`` object.
+"""
+return (
+self._sqlalchemy_scheme or
+self.connection_extra_lower.get('sqlalchemy_scheme') or
+self.DEFAULT_SQLALCHEMY_SCHEME
+)
+
+@property
+def connection_extra_lower(self):
+"""
+``connection.extra_dejson`` but where keys are converted to lower case.
+
+This is used internally for case-insensitive access of odbc params.
+"""
+return {k.lower(): v for k, v in self.connection.extra_dejson.items()}
+
+@property
+def driver(self):
+"""
+Driver from init param if given; else try to find one in connection 
extra.
+"""
+if not self._driver:
+driver = self.connection_extra_lower.get('driver')
+if driver:
+self._driver = driver
+return self._driver and 
self._driver.strip().lstrip('{').rstrip('}').strip()
 
 Review comment:
   this way your conn extra can look like this:
   ```
   {
 "Driver": "ODBC Driver 17 for SQL Server",
 "ApplicationIntent": "ReadOnly",
 "TrustedConnection": "Yes"
   }
   ```
   (i.e. no braces)
   
   If you build your conn_uri from env vars, this is especially helpful because 
in airflow conn_uri format you have to urlencode curly braces so with braces it 
would look like this:
   ```
   
'none://?Driver=%7BODBC+Driver+17+for+SQL+Server%7D=ReadOnly=Yes'
   ```
   
   

[GitHub] [airflow] houqp commented on issue #7187: [AIRFLOW-6576] fix scheduler crash caused by deleted task with sla misses

2020-01-16 Thread GitBox
houqp commented on issue #7187: [AIRFLOW-6576] fix scheduler crash caused by 
deleted task with sla misses
URL: https://github.com/apache/airflow/pull/7187#issuecomment-575264382
 
 
   @potiuk added logging, ready for review again.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on issue #4751: [AIRFLOW-3607] collected trigger rule dep check per dag run

2020-01-16 Thread GitBox
ashb commented on issue #4751: [AIRFLOW-3607] collected trigger rule dep check 
per dag run
URL: https://github.com/apache/airflow/pull/4751#issuecomment-575263895
 
 
   That change looks like it fixed it, so I'll hold off on merging my PR


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] dstandish commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC hook & deprecation warning for pymssql

2020-01-16 Thread GitBox
dstandish commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC 
hook & deprecation warning for pymssql
URL: https://github.com/apache/airflow/pull/6850#discussion_r367556670
 
 

 ##
 File path: airflow/providers/odbc/hooks/odbc.py
 ##
 @@ -0,0 +1,220 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+This module contains ODBC hook.
+"""
+from typing import Optional
+from urllib.parse import quote_plus
+
+import pyodbc
+
+from airflow.hooks.dbapi_hook import DbApiHook
+from airflow.utils.helpers import merge_dicts
+
+
+class OdbcHook(DbApiHook):
+"""
+Interact with odbc data sources using pyodbc.
+
+See :ref:`howto/connection/odbc` for full documentation.
+"""
+
+DEFAULT_SQLALCHEMY_SCHEME = 'mssql+pyodbc'
+conn_name_attr = 'odbc_conn_id'
+default_conn_name = 'odbc_default'
+supports_autocommit = True
+
+def __init__(
+self,
+*args,
+database: Optional[str] = None,
+driver: Optional[str] = None,
+dsn: Optional[str] = None,
+connect_kwargs: Optional[dict] = None,
+sqlalchemy_scheme: Optional[str] = None,
+**kwargs,
+):
+"""
+:param args: passed to DbApiHook
+:param database: database to use -- overrides connection ``schema``
+:param driver: name of driver or path to driver. overrides driver 
supplied in connection ``extra``
+:param dsn: name of DSN to use.  overrides DSN supplied in connection 
``extra``
+:param connect_kwargs: keyword arguments passed to ``pyodbc.connect``
+:param sqlalchemy_scheme: Scheme sqlalchemy connection.  Default is 
``mssql+pyodbc`` Only used for
+  ``get_sqlalchemy_engine`` and ``get_sqlalchemy_connection`` methods.
+:param kwargs: passed to DbApiHook
+"""
+super().__init__(*args, **kwargs)
+self._database = database
+self._driver = driver
+self._dsn = dsn
+self._conn_str = None
+self._sqlalchemy_scheme = sqlalchemy_scheme
+self._connection = None
+self._connect_kwargs = connect_kwargs
+
+@property
+def connection(self):
+"""
+``airflow.Connection`` object with connection id ``odbc_conn_id``
+"""
+if not self._connection:
+self._connection = self.get_connection(getattr(self, 
self.conn_name_attr))
+return self._connection
+
+@property
+def database(self):
+"""
+Database provided in init if exists; otherwise, ``schema`` from 
``Connection`` object.
+"""
+return self._database or self.connection.schema
+
+@property
+def sqlalchemy_scheme(self):
+"""
+Database provided in init if exists; otherwise, ``schema`` from 
``Connection`` object.
+"""
+return (
+self._sqlalchemy_scheme or
+self.connection_extra_lower.get('sqlalchemy_scheme') or
+self.DEFAULT_SQLALCHEMY_SCHEME
+)
+
+@property
+def connection_extra_lower(self):
+"""
+``connection.extra_dejson`` but where keys are converted to lower case.
+
+This is used internally for case-insensitive access of odbc params.
+"""
+return {k.lower(): v for k, v in self.connection.extra_dejson.items()}
+
+@property
+def driver(self):
+"""
+Driver from init param if given; else try to find one in connection 
extra.
+"""
+if not self._driver:
+driver = self.connection_extra_lower.get('driver')
+if driver:
+self._driver = driver
+return self._driver and 
self._driver.strip().lstrip('{').rstrip('}').strip()
 
 Review comment:
   i am stripping it so that user could do `{ODBC Driver 17 for SQL Server}` or 
`ODBC Driver 17 for SQL Server` and either way will work.
   
   this way, whether user has provided them or not, when i build the connection 
string, i add safely add the curly braces.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For 

[GitHub] [airflow] dstandish commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC hook & deprecation warning for pymssql

2020-01-16 Thread GitBox
dstandish commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC 
hook & deprecation warning for pymssql
URL: https://github.com/apache/airflow/pull/6850#discussion_r367555606
 
 

 ##
 File path: airflow/providers/odbc/hooks/odbc.py
 ##
 @@ -0,0 +1,220 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+This module contains ODBC hook.
+"""
+from typing import Optional
+from urllib.parse import quote_plus
+
+import pyodbc
+
+from airflow.hooks.dbapi_hook import DbApiHook
+from airflow.utils.helpers import merge_dicts
+
+
+class OdbcHook(DbApiHook):
+"""
+Interact with odbc data sources using pyodbc.
+
+See :ref:`howto/connection/odbc` for full documentation.
+"""
+
+DEFAULT_SQLALCHEMY_SCHEME = 'mssql+pyodbc'
+conn_name_attr = 'odbc_conn_id'
+default_conn_name = 'odbc_default'
+supports_autocommit = True
+
+def __init__(
+self,
+*args,
+database: Optional[str] = None,
+driver: Optional[str] = None,
+dsn: Optional[str] = None,
+connect_kwargs: Optional[dict] = None,
+sqlalchemy_scheme: Optional[str] = None,
+**kwargs,
+):
+"""
+:param args: passed to DbApiHook
+:param database: database to use -- overrides connection ``schema``
+:param driver: name of driver or path to driver. overrides driver 
supplied in connection ``extra``
+:param dsn: name of DSN to use.  overrides DSN supplied in connection 
``extra``
+:param connect_kwargs: keyword arguments passed to ``pyodbc.connect``
+:param sqlalchemy_scheme: Scheme sqlalchemy connection.  Default is 
``mssql+pyodbc`` Only used for
+  ``get_sqlalchemy_engine`` and ``get_sqlalchemy_connection`` methods.
+:param kwargs: passed to DbApiHook
+"""
+super().__init__(*args, **kwargs)
+self._database = database
+self._driver = driver
+self._dsn = dsn
+self._conn_str = None
+self._sqlalchemy_scheme = sqlalchemy_scheme
+self._connection = None
+self._connect_kwargs = connect_kwargs
+
+@property
+def connection(self):
+"""
+``airflow.Connection`` object with connection id ``odbc_conn_id``
+"""
+if not self._connection:
+self._connection = self.get_connection(getattr(self, 
self.conn_name_attr))
+return self._connection
+
+@property
+def database(self):
+"""
+Database provided in init if exists; otherwise, ``schema`` from 
``Connection`` object.
+"""
+return self._database or self.connection.schema
+
+@property
+def sqlalchemy_scheme(self):
+"""
+Database provided in init if exists; otherwise, ``schema`` from 
``Connection`` object.
+"""
+return (
+self._sqlalchemy_scheme or
+self.connection_extra_lower.get('sqlalchemy_scheme') or
+self.DEFAULT_SQLALCHEMY_SCHEME
+)
+
+@property
+def connection_extra_lower(self):
+"""
+``connection.extra_dejson`` but where keys are converted to lower case.
+
+This is used internally for case-insensitive access of odbc params.
+"""
+return {k.lower(): v for k, v in self.connection.extra_dejson.items()}
+
+@property
+def driver(self):
+"""
+Driver from init param if given; else try to find one in connection 
extra.
+"""
+if not self._driver:
+driver = self.connection_extra_lower.get('driver')
+if driver:
+self._driver = driver
+return self._driver and 
self._driver.strip().lstrip('{').rstrip('}').strip()
+
+@property
+def dsn(self):
+"""
+DSN from init param if given; else try to find one in connection extra.
+"""
+if not self._dsn:
+dsn = self.connection_extra_lower.get('dsn')
+if dsn:
+self._dsn = dsn.strip()
+return self._dsn
+
+@property
+def odbc_connection_string(self):
+"""
+ODBC connection string
+We build connection string instead of using ``pyodbc.connect`` params 
because, for example, 

[jira] [Commented] (AIRFLOW-2971) Health check command for scheduler

2020-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017330#comment-17017330
 ] 

ASF GitHub Bot commented on AIRFLOW-2971:
-

stale[bot] commented on pull request #6277: [AIRFLOW-2971] Add health check CLI 
for scheduler
URL: https://github.com/apache/airflow/pull/6277
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Health check command for scheduler
> --
>
> Key: AIRFLOW-2971
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2971
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Jon Davies
>Priority: Major
>
> As part of a Kubernetes deployment of Airflow, I would like to define an exec 
> command based health check for the Airflow scheduler:
> - 
> https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
> ...the webserver is simple as all that needs is checking that the HTTP port 
> is available. For the scheduler, it would be neat to have a command such as:
> airflow scheduler health
> That returned OK and exit 0/NOT OK and a non-zero value when it cannot reach 
> the database for instance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] dstandish commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC hook & deprecation warning for pymssql

2020-01-16 Thread GitBox
dstandish commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC 
hook & deprecation warning for pymssql
URL: https://github.com/apache/airflow/pull/6850#discussion_r367554581
 
 

 ##
 File path: airflow/providers/odbc/hooks/odbc.py
 ##
 @@ -0,0 +1,220 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+This module contains ODBC hook.
+"""
+from typing import Optional
+from urllib.parse import quote_plus
+
+import pyodbc
+
+from airflow.hooks.dbapi_hook import DbApiHook
+from airflow.utils.helpers import merge_dicts
+
+
+class OdbcHook(DbApiHook):
+"""
+Interact with odbc data sources using pyodbc.
+
+See :ref:`howto/connection/odbc` for full documentation.
+"""
+
+DEFAULT_SQLALCHEMY_SCHEME = 'mssql+pyodbc'
+conn_name_attr = 'odbc_conn_id'
+default_conn_name = 'odbc_default'
+supports_autocommit = True
+
+def __init__(
+self,
+*args,
+database: Optional[str] = None,
+driver: Optional[str] = None,
+dsn: Optional[str] = None,
+connect_kwargs: Optional[dict] = None,
+sqlalchemy_scheme: Optional[str] = None,
+**kwargs,
+):
+"""
+:param args: passed to DbApiHook
+:param database: database to use -- overrides connection ``schema``
+:param driver: name of driver or path to driver. overrides driver 
supplied in connection ``extra``
+:param dsn: name of DSN to use.  overrides DSN supplied in connection 
``extra``
+:param connect_kwargs: keyword arguments passed to ``pyodbc.connect``
+:param sqlalchemy_scheme: Scheme sqlalchemy connection.  Default is 
``mssql+pyodbc`` Only used for
+  ``get_sqlalchemy_engine`` and ``get_sqlalchemy_connection`` methods.
+:param kwargs: passed to DbApiHook
+"""
+super().__init__(*args, **kwargs)
+self._database = database
+self._driver = driver
+self._dsn = dsn
+self._conn_str = None
+self._sqlalchemy_scheme = sqlalchemy_scheme
+self._connection = None
+self._connect_kwargs = connect_kwargs
+
+@property
+def connection(self):
+"""
+``airflow.Connection`` object with connection id ``odbc_conn_id``
+"""
+if not self._connection:
+self._connection = self.get_connection(getattr(self, 
self.conn_name_attr))
+return self._connection
+
+@property
+def database(self):
+"""
+Database provided in init if exists; otherwise, ``schema`` from 
``Connection`` object.
+"""
+return self._database or self.connection.schema
+
+@property
+def sqlalchemy_scheme(self):
+"""
+Database provided in init if exists; otherwise, ``schema`` from 
``Connection`` object.
+"""
+return (
+self._sqlalchemy_scheme or
+self.connection_extra_lower.get('sqlalchemy_scheme') or
+self.DEFAULT_SQLALCHEMY_SCHEME
+)
+
+@property
+def connection_extra_lower(self):
+"""
+``connection.extra_dejson`` but where keys are converted to lower case.
+
+This is used internally for case-insensitive access of odbc params.
+"""
+return {k.lower(): v for k, v in self.connection.extra_dejson.items()}
+
+@property
+def driver(self):
+"""
+Driver from init param if given; else try to find one in connection 
extra.
+"""
+if not self._driver:
+driver = self.connection_extra_lower.get('driver')
+if driver:
+self._driver = driver
+return self._driver and 
self._driver.strip().lstrip('{').rstrip('}').strip()
+
+@property
+def dsn(self):
+"""
+DSN from init param if given; else try to find one in connection extra.
+"""
+if not self._dsn:
+dsn = self.connection_extra_lower.get('dsn')
+if dsn:
+self._dsn = dsn.strip()
+return self._dsn
+
+@property
+def odbc_connection_string(self):
+"""
+ODBC connection string
+We build connection string instead of using ``pyodbc.connect`` params 
because, for example, 

[GitHub] [airflow] stale[bot] closed pull request #6277: [AIRFLOW-2971] Add health check CLI for scheduler

2020-01-16 Thread GitBox
stale[bot] closed pull request #6277: [AIRFLOW-2971] Add health check CLI for 
scheduler
URL: https://github.com/apache/airflow/pull/6277
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-6558) Campaign Manager Operators to insert and modify conversions

2020-01-16 Thread Tomasz Urbaszek (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Urbaszek updated AIRFLOW-6558:
-
Summary: Campaign Manager Operators to insert and modify conversions  (was: 
Campaign Manager Operators to insert and modify conversations)

> Campaign Manager Operators to insert and modify conversions
> ---
>
> Key: AIRFLOW-6558
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6558
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: gcp, operators
>Affects Versions: 2.0.0
>Reporter: Tomasz Urbaszek
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] amichai07 commented on a change in pull request #4751: [AIRFLOW-3607] collected trigger rule dep check per dag run

2020-01-16 Thread GitBox
amichai07 commented on a change in pull request #4751: [AIRFLOW-3607] collected 
trigger rule dep check per dag run
URL: https://github.com/apache/airflow/pull/4751#discussion_r367551483
 
 

 ##
 File path: tests/models/test_dagrun.py
 ##
 @@ -234,8 +234,9 @@ def test_dagrun_deadlock(self):
 ti_op2.set_state(state=State.NONE, session=session)
 
 dr.update_state()
-self.assertEqual(dr.state, State.RUNNING)
+self.assertEqual(dr.state, State.FAILED)
 
 Review comment:
   Ok I found the reason. what happens here is that the later task state 
changes to be "skipped" (which is a bug in my opinion it should be "upstream 
failed") anyway I deleted this part where it checks if a task state has changed 
- this is a mistake,  I will return it now.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on a change in pull request #4751: [AIRFLOW-3607] collected trigger rule dep check per dag run

2020-01-16 Thread GitBox
ashb commented on a change in pull request #4751: [AIRFLOW-3607] collected 
trigger rule dep check per dag run
URL: https://github.com/apache/airflow/pull/4751#discussion_r367549812
 
 

 ##
 File path: tests/models/test_dagrun.py
 ##
 @@ -234,8 +234,9 @@ def test_dagrun_deadlock(self):
 ti_op2.set_state(state=State.NONE, session=session)
 
 dr.update_state()
-self.assertEqual(dr.state, State.RUNNING)
+self.assertEqual(dr.state, State.FAILED)
 
+dr.set_state(State.RUNNING)
 
 Review comment:
   So before your change these tasks ended up in this state: `[, ]`. And looking at the TriggerRuleDep for ONE_FAILED:
   
   ```
   elif tr == TR.ONE_FAILED:
   if upstream_done and not (failed or upstream_failed):
   ti.set_state(State.SKIPPED, session)
   ```
   
   So that looks right - op2/B should be in SKIPPED state.
   
   @amichai07 So this is a bug, and the test was right and shouldn't be changed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

2020-01-16 Thread GitBox
larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file 
mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#issuecomment-575254776
 
 
   > * g another ticket?
   Please see the test case, I need the index parameter so that I can ensure 
there is only one log line in the index and I use a separate index for that. So 
it would be hard not to have the index parameter.
   >   general note: i would like to see test case to proof bug and improve 
test coverage after this change.
   added test cases
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] vamega commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC hook & deprecation warning for pymssql

2020-01-16 Thread GitBox
vamega commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC 
hook & deprecation warning for pymssql
URL: https://github.com/apache/airflow/pull/6850#discussion_r367497800
 
 

 ##
 File path: docs/howto/connection/odbc.rst
 ##
 @@ -0,0 +1,107 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+.. _howto/connection/odbc:
+
+ODBC Connection
+===
+
+The ``odbc`` connection type provides connection to ODBC data sources 
including MS SQL Server.
+
+Enable with ``pip install apache-airflow[odbc]``.
+
+
+System prerequisites
+
+
+This connection type uses `pyodbc `_, 
which has some system
+dependencies, as documented on the `pyodbc wiki 
`_.
+
+You must also install a driver:
+
+* `MS SQL ODBC drivers 
`_.
+
+* `Exasol ODBC drivers 
`_.
+
+
+Configuring the Connection
+--
+Host (required)
+The host to connect to.
+
+Schema (optional)
+Specify the schema name to be used in the database.
+
+Login (required)
+Specify the user name to connect.
+
+Password (required)
+Specify the password to connect.
+
+Extra (optional)
+Any key / value parameters supplied here will be added to the ODBC 
connection string.
+
+.. note::
+
+To use the hook 
:py:class:`~airflow.providers.odbc.hooks.odbc.OdbcHook` you must specify the
+driver you want to use, in the ``Connection.extra`` field or as a 
parameter at hook initialization.
+
+For example, consider the following value for ``extra``:
+
+.. code-block:: json
+
+{
+  "Driver": "ODBC Driver 17 for SQL Server",
+  "ApplicationIntent": "ReadOnly",
+  "TrustedConnection": "Yes"
+}
+
+This would produce a connection string containing these params:
+
+.. code-block::
+
+DRIVER={ODBC Driver 17 for SQL 
Server};ApplicationIntent=ReadOnly;TrustedConnection=Yes;
 
 Review comment:
   Not sure if it's worth mentioning that the `odbc.ini` and maybe some other 
file might need to be installed in the appropriate directory for this to work.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] vamega commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC hook & deprecation warning for pymssql

2020-01-16 Thread GitBox
vamega commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC 
hook & deprecation warning for pymssql
URL: https://github.com/apache/airflow/pull/6850#discussion_r367022367
 
 

 ##
 File path: airflow/providers/odbc/hooks/odbc.py
 ##
 @@ -0,0 +1,220 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+This module contains ODBC hook.
+"""
+from typing import Optional
+from urllib.parse import quote_plus
+
+import pyodbc
+
+from airflow.hooks.dbapi_hook import DbApiHook
+from airflow.utils.helpers import merge_dicts
+
+
+class OdbcHook(DbApiHook):
+"""
+Interact with odbc data sources using pyodbc.
+
+See :ref:`howto/connection/odbc` for full documentation.
+"""
+
+DEFAULT_SQLALCHEMY_SCHEME = 'mssql+pyodbc'
+conn_name_attr = 'odbc_conn_id'
+default_conn_name = 'odbc_default'
+supports_autocommit = True
+
+def __init__(
+self,
+*args,
+database: Optional[str] = None,
+driver: Optional[str] = None,
+dsn: Optional[str] = None,
+connect_kwargs: Optional[dict] = None,
+sqlalchemy_scheme: Optional[str] = None,
+**kwargs,
+):
+"""
+:param args: passed to DbApiHook
+:param database: database to use -- overrides connection ``schema``
+:param driver: name of driver or path to driver. overrides driver 
supplied in connection ``extra``
+:param dsn: name of DSN to use.  overrides DSN supplied in connection 
``extra``
+:param connect_kwargs: keyword arguments passed to ``pyodbc.connect``
+:param sqlalchemy_scheme: Scheme sqlalchemy connection.  Default is 
``mssql+pyodbc`` Only used for
+  ``get_sqlalchemy_engine`` and ``get_sqlalchemy_connection`` methods.
+:param kwargs: passed to DbApiHook
+"""
+super().__init__(*args, **kwargs)
+self._database = database
+self._driver = driver
+self._dsn = dsn
+self._conn_str = None
+self._sqlalchemy_scheme = sqlalchemy_scheme
+self._connection = None
+self._connect_kwargs = connect_kwargs
+
+@property
+def connection(self):
+"""
+``airflow.Connection`` object with connection id ``odbc_conn_id``
+"""
+if not self._connection:
+self._connection = self.get_connection(getattr(self, 
self.conn_name_attr))
+return self._connection
+
+@property
+def database(self):
+"""
+Database provided in init if exists; otherwise, ``schema`` from 
``Connection`` object.
+"""
+return self._database or self.connection.schema
+
+@property
+def sqlalchemy_scheme(self):
+"""
+Database provided in init if exists; otherwise, ``schema`` from 
``Connection`` object.
+"""
+return (
+self._sqlalchemy_scheme or
+self.connection_extra_lower.get('sqlalchemy_scheme') or
+self.DEFAULT_SQLALCHEMY_SCHEME
+)
+
+@property
+def connection_extra_lower(self):
+"""
+``connection.extra_dejson`` but where keys are converted to lower case.
+
+This is used internally for case-insensitive access of odbc params.
+"""
+return {k.lower(): v for k, v in self.connection.extra_dejson.items()}
+
+@property
+def driver(self):
+"""
+Driver from init param if given; else try to find one in connection 
extra.
+"""
+if not self._driver:
+driver = self.connection_extra_lower.get('driver')
+if driver:
+self._driver = driver
+return self._driver and 
self._driver.strip().lstrip('{').rstrip('}').strip()
+
+@property
+def dsn(self):
+"""
+DSN from init param if given; else try to find one in connection extra.
+"""
+if not self._dsn:
+dsn = self.connection_extra_lower.get('dsn')
+if dsn:
+self._dsn = dsn.strip()
+return self._dsn
+
+@property
+def odbc_connection_string(self):
+"""
+ODBC connection string
+We build connection string instead of using ``pyodbc.connect`` params 
because, for example, 

[GitHub] [airflow] vamega commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC hook & deprecation warning for pymssql

2020-01-16 Thread GitBox
vamega commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC 
hook & deprecation warning for pymssql
URL: https://github.com/apache/airflow/pull/6850#discussion_r367021605
 
 

 ##
 File path: airflow/providers/odbc/hooks/odbc.py
 ##
 @@ -0,0 +1,220 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+This module contains ODBC hook.
+"""
+from typing import Optional
+from urllib.parse import quote_plus
+
+import pyodbc
+
+from airflow.hooks.dbapi_hook import DbApiHook
+from airflow.utils.helpers import merge_dicts
+
+
+class OdbcHook(DbApiHook):
+"""
+Interact with odbc data sources using pyodbc.
+
+See :ref:`howto/connection/odbc` for full documentation.
+"""
+
+DEFAULT_SQLALCHEMY_SCHEME = 'mssql+pyodbc'
+conn_name_attr = 'odbc_conn_id'
+default_conn_name = 'odbc_default'
+supports_autocommit = True
+
+def __init__(
+self,
+*args,
+database: Optional[str] = None,
+driver: Optional[str] = None,
+dsn: Optional[str] = None,
+connect_kwargs: Optional[dict] = None,
+sqlalchemy_scheme: Optional[str] = None,
+**kwargs,
+):
+"""
+:param args: passed to DbApiHook
+:param database: database to use -- overrides connection ``schema``
+:param driver: name of driver or path to driver. overrides driver 
supplied in connection ``extra``
+:param dsn: name of DSN to use.  overrides DSN supplied in connection 
``extra``
+:param connect_kwargs: keyword arguments passed to ``pyodbc.connect``
+:param sqlalchemy_scheme: Scheme sqlalchemy connection.  Default is 
``mssql+pyodbc`` Only used for
+  ``get_sqlalchemy_engine`` and ``get_sqlalchemy_connection`` methods.
+:param kwargs: passed to DbApiHook
+"""
+super().__init__(*args, **kwargs)
+self._database = database
+self._driver = driver
+self._dsn = dsn
+self._conn_str = None
+self._sqlalchemy_scheme = sqlalchemy_scheme
+self._connection = None
+self._connect_kwargs = connect_kwargs
+
+@property
+def connection(self):
+"""
+``airflow.Connection`` object with connection id ``odbc_conn_id``
+"""
+if not self._connection:
+self._connection = self.get_connection(getattr(self, 
self.conn_name_attr))
+return self._connection
+
+@property
+def database(self):
+"""
+Database provided in init if exists; otherwise, ``schema`` from 
``Connection`` object.
+"""
+return self._database or self.connection.schema
+
+@property
+def sqlalchemy_scheme(self):
+"""
+Database provided in init if exists; otherwise, ``schema`` from 
``Connection`` object.
+"""
+return (
+self._sqlalchemy_scheme or
+self.connection_extra_lower.get('sqlalchemy_scheme') or
+self.DEFAULT_SQLALCHEMY_SCHEME
+)
+
+@property
+def connection_extra_lower(self):
+"""
+``connection.extra_dejson`` but where keys are converted to lower case.
+
+This is used internally for case-insensitive access of odbc params.
+"""
+return {k.lower(): v for k, v in self.connection.extra_dejson.items()}
+
+@property
+def driver(self):
+"""
+Driver from init param if given; else try to find one in connection 
extra.
+"""
+if not self._driver:
+driver = self.connection_extra_lower.get('driver')
+if driver:
+self._driver = driver
+return self._driver and 
self._driver.strip().lstrip('{').rstrip('}').strip()
+
+@property
+def dsn(self):
+"""
+DSN from init param if given; else try to find one in connection extra.
+"""
+if not self._dsn:
+dsn = self.connection_extra_lower.get('dsn')
+if dsn:
+self._dsn = dsn.strip()
+return self._dsn
+
+@property
+def odbc_connection_string(self):
+"""
+ODBC connection string
+We build connection string instead of using ``pyodbc.connect`` params 
because, for example, 

[GitHub] [airflow] vamega commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC hook & deprecation warning for pymssql

2020-01-16 Thread GitBox
vamega commented on a change in pull request #6850: [AIRFLOW-6296] add ODBC 
hook & deprecation warning for pymssql
URL: https://github.com/apache/airflow/pull/6850#discussion_r367530889
 
 

 ##
 File path: airflow/providers/odbc/hooks/odbc.py
 ##
 @@ -0,0 +1,220 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+This module contains ODBC hook.
+"""
+from typing import Optional
+from urllib.parse import quote_plus
+
+import pyodbc
+
+from airflow.hooks.dbapi_hook import DbApiHook
+from airflow.utils.helpers import merge_dicts
+
+
+class OdbcHook(DbApiHook):
+"""
+Interact with odbc data sources using pyodbc.
+
+See :ref:`howto/connection/odbc` for full documentation.
+"""
+
+DEFAULT_SQLALCHEMY_SCHEME = 'mssql+pyodbc'
+conn_name_attr = 'odbc_conn_id'
+default_conn_name = 'odbc_default'
+supports_autocommit = True
+
+def __init__(
+self,
+*args,
+database: Optional[str] = None,
+driver: Optional[str] = None,
+dsn: Optional[str] = None,
+connect_kwargs: Optional[dict] = None,
+sqlalchemy_scheme: Optional[str] = None,
+**kwargs,
+):
+"""
+:param args: passed to DbApiHook
+:param database: database to use -- overrides connection ``schema``
+:param driver: name of driver or path to driver. overrides driver 
supplied in connection ``extra``
+:param dsn: name of DSN to use.  overrides DSN supplied in connection 
``extra``
+:param connect_kwargs: keyword arguments passed to ``pyodbc.connect``
+:param sqlalchemy_scheme: Scheme sqlalchemy connection.  Default is 
``mssql+pyodbc`` Only used for
+  ``get_sqlalchemy_engine`` and ``get_sqlalchemy_connection`` methods.
+:param kwargs: passed to DbApiHook
+"""
+super().__init__(*args, **kwargs)
+self._database = database
+self._driver = driver
+self._dsn = dsn
+self._conn_str = None
+self._sqlalchemy_scheme = sqlalchemy_scheme
+self._connection = None
+self._connect_kwargs = connect_kwargs
+
+@property
+def connection(self):
+"""
+``airflow.Connection`` object with connection id ``odbc_conn_id``
+"""
+if not self._connection:
+self._connection = self.get_connection(getattr(self, 
self.conn_name_attr))
+return self._connection
+
+@property
+def database(self):
+"""
+Database provided in init if exists; otherwise, ``schema`` from 
``Connection`` object.
+"""
+return self._database or self.connection.schema
+
+@property
+def sqlalchemy_scheme(self):
+"""
+Database provided in init if exists; otherwise, ``schema`` from 
``Connection`` object.
+"""
+return (
+self._sqlalchemy_scheme or
+self.connection_extra_lower.get('sqlalchemy_scheme') or
+self.DEFAULT_SQLALCHEMY_SCHEME
+)
+
+@property
+def connection_extra_lower(self):
+"""
+``connection.extra_dejson`` but where keys are converted to lower case.
+
+This is used internally for case-insensitive access of odbc params.
+"""
+return {k.lower(): v for k, v in self.connection.extra_dejson.items()}
+
+@property
+def driver(self):
+"""
+Driver from init param if given; else try to find one in connection 
extra.
+"""
+if not self._driver:
+driver = self.connection_extra_lower.get('driver')
+if driver:
+self._driver = driver
+return self._driver and 
self._driver.strip().lstrip('{').rstrip('}').strip()
 
 Review comment:
   Are we stripping `{` and `}` just to be extra defensive?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5501) in_cluster default value in KubernetesPodOperator overwrites configuration

2020-01-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017306#comment-17017306
 ] 

ASF subversion and git services commented on AIRFLOW-5501:
--

Commit e54fba5b479e36ecad6afb8d3920a534af6e6135 in airflow's branch 
refs/heads/master from Quentin Lemaire
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=e54fba5 ]

[AIRFLOW-5501] Make default `in_cluster` value in KubernetesPodOperator respect 
config (#6124)

The default value of the parameter in_cluster of the
kube_client.get_kube_client function is
in_cluster=conf.getboolean('kubernetes', 'in_cluster'). Therefore, the
expected behavior is that when, in_cluster is not set, it takes the
value in the configuration file.

However, the default value of in_cluster in KubernetesPodOperator.py is
False and in_cluster is passed as a parameter when calling the
kube_client.get_kube_client function. Therefore, it changes the
expecting behavior by overwritting the default value. When in_cluster is
not set when initializing KubernetesPodOperator, the value of in_cluster
in kube_client.get_kube_client is False and not the value which is in
the configuration file.

Therefore, the default value of in_cluster in KubernetesPodOperator has
been changed to None and will not be passed to get_kube_client if it is
not overwritten so that it takes the configuration value as a default
value.

Co-authored-by: Ash Berlin-Taylor 

> in_cluster default value in KubernetesPodOperator overwrites configuration
> --
>
> Key: AIRFLOW-5501
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5501
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Affects Versions: 1.10.5
>Reporter: Quentin Lemaire
>Priority: Major
> Fix For: 1.10.8
>
>
> Hi!
> The default value of the parameter *in_cluster* of the 
> *kube_client.get_kube_client* function is 
> *in_cluster=conf.getboolean('kubernetes', 'in_cluster').* Therefore, the 
> expected behavior is that when, *in_cluster* is not set, it takes the value 
> in the configuration file.
> However, the default value of *in_cluster* in *KubernetesPodOperator.py* is 
> False and *in_cluster* is passed as a parameter when calling the 
> *kube_client.get_kube_client* function. Therefore, it changes the expecting 
> behavior by overwritting the default value. When *in_cluster* is not set when 
> initializing *KubernetesPodOperator*, the value of *in_cluster* in 
> *kube_client.get_kube_client* is False and not the value which is in the 
> configuration file.
> It is quite confusing because it can feel like the value in the configuration 
> file is not working properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5501) in_cluster default value in KubernetesPodOperator overwrites configuration

2020-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017305#comment-17017305
 ] 

ASF GitHub Bot commented on AIRFLOW-5501:
-

ashb commented on pull request #6124: [AIRFLOW-5501] in_cluster default value 
in KubernetesPodOperator overwrites configuration
URL: https://github.com/apache/airflow/pull/6124
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> in_cluster default value in KubernetesPodOperator overwrites configuration
> --
>
> Key: AIRFLOW-5501
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5501
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Affects Versions: 1.10.5
>Reporter: Quentin Lemaire
>Priority: Major
>
> Hi!
> The default value of the parameter *in_cluster* of the 
> *kube_client.get_kube_client* function is 
> *in_cluster=conf.getboolean('kubernetes', 'in_cluster').* Therefore, the 
> expected behavior is that when, *in_cluster* is not set, it takes the value 
> in the configuration file.
> However, the default value of *in_cluster* in *KubernetesPodOperator.py* is 
> False and *in_cluster* is passed as a parameter when calling the 
> *kube_client.get_kube_client* function. Therefore, it changes the expecting 
> behavior by overwritting the default value. When *in_cluster* is not set when 
> initializing *KubernetesPodOperator*, the value of *in_cluster* in 
> *kube_client.get_kube_client* is False and not the value which is in the 
> configuration file.
> It is quite confusing because it can feel like the value in the configuration 
> file is not working properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4030) Add Singularity Container Operator

2020-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017309#comment-17017309
 ] 

ASF GitHub Bot commented on AIRFLOW-4030:
-

vsoch commented on pull request #7191: [AIRFLOW-4030] second attempt to add 
singularity to airflow
URL: https://github.com/apache/airflow/pull/7191
 
 
   This is a second attempt to add Singularity Container support to Apache 
Airflow by way of Singularity Python. I am using the previously created JIRA 
ticket 4030 (created in March 2019) as it is still relevant. I am a new 
contributor and largely not familiar with the community here (and yes I've read 
the guidelines) so I would appreciate support and kindness from the individuals 
that act as maintainers here, and any additional support from other folks that 
are also interested in this integration. Thank you!
   
   Signed-off-by: Vanessa Sochat 
   
   ---
   Issue link: WILL BE INSERTED BY 
[boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Commit message/PR title starts with `[AIRFLOW-]`. AIRFLOW- = 
JIRA ID*
   - [ ] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Commits follow "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)"
   - [ ] Relevant documentation is updated including usage instructions.
   - [ ] I will engage committers as explained in [Contribution Workflow 
Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   * For document-only changes commit message can start with 
`[AIRFLOW-]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)
 for more information.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Singularity Container Operator
> --
>
> Key: AIRFLOW-4030
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4030
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Vanessa
>Assignee: Vanessa
>Priority: Minor
>
> Singularity containers are comparable to Docker in the level of operation - 
> they provide an encapsulated environment with an OS, libraries, and custom 
> software for the user to run. The key difference is that Docker is not 
> optimized for scientific compute because it could never be installed on a 
> shared research cluster. Singularity, on the other hand, does not have these 
> issues and is installed across HCP centers internationally.
> This issue is to add Singularity containers as an operator to Apache Airflow, 
> so that we can start to explore using airflow in an HPC environment. I work 
> with Encode DCC at Stanford, and am hopeful to explore Airflow as an 
> alternative to the workflow manager(s) we are using. I am one of the 
> [original Singularity developers see 
> |https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0177459] 
> that manages the Singularity Python client (spython), Singularity Hub and 
> Singularity Registry Server, and have started working on this issue here: 
> [https://github.com/apache/airflow/pull/4846.] Looking forward to working 
> with you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-5501) in_cluster default value in KubernetesPodOperator overwrites configuration

2020-01-16 Thread Ash Berlin-Taylor (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-5501.

Fix Version/s: 1.10.8
   Resolution: Fixed

> in_cluster default value in KubernetesPodOperator overwrites configuration
> --
>
> Key: AIRFLOW-5501
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5501
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Affects Versions: 1.10.5
>Reporter: Quentin Lemaire
>Priority: Major
> Fix For: 1.10.8
>
>
> Hi!
> The default value of the parameter *in_cluster* of the 
> *kube_client.get_kube_client* function is 
> *in_cluster=conf.getboolean('kubernetes', 'in_cluster').* Therefore, the 
> expected behavior is that when, *in_cluster* is not set, it takes the value 
> in the configuration file.
> However, the default value of *in_cluster* in *KubernetesPodOperator.py* is 
> False and *in_cluster* is passed as a parameter when calling the 
> *kube_client.get_kube_client* function. Therefore, it changes the expecting 
> behavior by overwritting the default value. When *in_cluster* is not set when 
> initializing *KubernetesPodOperator*, the value of *in_cluster* in 
> *kube_client.get_kube_client* is False and not the value which is in the 
> configuration file.
> It is quite confusing because it can feel like the value in the configuration 
> file is not working properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] vsoch opened a new pull request #7191: [AIRFLOW-4030] second attempt to add singularity to airflow

2020-01-16 Thread GitBox
vsoch opened a new pull request #7191: [AIRFLOW-4030] second attempt to add 
singularity to airflow
URL: https://github.com/apache/airflow/pull/7191
 
 
   This is a second attempt to add Singularity Container support to Apache 
Airflow by way of Singularity Python. I am using the previously created JIRA 
ticket 4030 (created in March 2019) as it is still relevant. I am a new 
contributor and largely not familiar with the community here (and yes I've read 
the guidelines) so I would appreciate support and kindness from the individuals 
that act as maintainers here, and any additional support from other folks that 
are also interested in this integration. Thank you!
   
   Signed-off-by: Vanessa Sochat 
   
   ---
   Issue link: WILL BE INSERTED BY 
[boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Commit message/PR title starts with `[AIRFLOW-]`. AIRFLOW- = 
JIRA ID*
   - [ ] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Commits follow "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)"
   - [ ] Relevant documentation is updated including usage instructions.
   - [ ] I will engage committers as explained in [Contribution Workflow 
Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   * For document-only changes commit message can start with 
`[AIRFLOW-]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)
 for more information.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb merged pull request #6124: [AIRFLOW-5501] in_cluster default value in KubernetesPodOperator overwrites configuration

2020-01-16 Thread GitBox
ashb merged pull request #6124: [AIRFLOW-5501] in_cluster default value in 
KubernetesPodOperator overwrites configuration
URL: https://github.com/apache/airflow/pull/6124
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-4030) Add Singularity Container Operator

2020-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017291#comment-17017291
 ] 

ASF GitHub Bot commented on AIRFLOW-4030:
-

vsoch commented on pull request #4846: [AIRFLOW-4030] adding start to 
singularity for airflow
URL: https://github.com/apache/airflow/pull/4846
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Singularity Container Operator
> --
>
> Key: AIRFLOW-4030
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4030
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Vanessa
>Assignee: Vanessa
>Priority: Minor
>
> Singularity containers are comparable to Docker in the level of operation - 
> they provide an encapsulated environment with an OS, libraries, and custom 
> software for the user to run. The key difference is that Docker is not 
> optimized for scientific compute because it could never be installed on a 
> shared research cluster. Singularity, on the other hand, does not have these 
> issues and is installed across HCP centers internationally.
> This issue is to add Singularity containers as an operator to Apache Airflow, 
> so that we can start to explore using airflow in an HPC environment. I work 
> with Encode DCC at Stanford, and am hopeful to explore Airflow as an 
> alternative to the workflow manager(s) we are using. I am one of the 
> [original Singularity developers see 
> |https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0177459] 
> that manages the Singularity Python client (spython), Singularity Hub and 
> Singularity Registry Server, and have started working on this issue here: 
> [https://github.com/apache/airflow/pull/4846.] Looking forward to working 
> with you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] vsoch closed pull request #4846: [AIRFLOW-4030] adding start to singularity for airflow

2020-01-16 Thread GitBox
vsoch closed pull request #4846: [AIRFLOW-4030] adding start to singularity for 
airflow
URL: https://github.com/apache/airflow/pull/4846
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] vsoch commented on issue #4846: [AIRFLOW-4030] adding start to singularity for airflow

2020-01-16 Thread GitBox
vsoch commented on issue #4846: [AIRFLOW-4030] adding start to singularity for 
airflow
URL: https://github.com/apache/airflow/pull/4846#issuecomment-575240752
 
 
   > Hello @vsoch -> I am sorry you have this extra work, but unfortunately it 
happens and this is part of our normal process. This is how - deeply 
asynchronous - the process of community-driven project is. There is no "your 
team" here - there are individual contributors and committers and even 
individual PMC members that are working mostly in their spare time and find the 
time outside of their normal work, at the expense of their families and 
personal time to review and comment on other's code and make it "ready" to 
submit. So there is no "your team" here. There are individual people. And 
sometimes they have other obligations and are simply swamped with their regular 
job so things usually take longer than in a "commercial" setting.
   
   This is completely understandable, and the norm for most open source 
projects that don't have some company (or similar) backing. The feedback I'd 
give is that regardless of an individual being a maintainer or an actual team, 
a new contributor should be welcomed and supported throughout the process. It's 
the individuals of any project doing the reviewing that really make or break 
the culture.
   
   > Yet we have much higher requirements (tests, documentation, quality of 
code, good architecture) precisely because we have no formal structure and no 
way to divide responsibilities - we have to make sure the code can be picked up 
by anyone, anytime and it is fully covered by automated tests.
   
   Also understandable, and in this case, as a new contributor, it would be 
helpful to have someone give guidance about what needs to be done, tested, etc. 
   
   > I often have PRs opened for weeks or even months until I have time and got 
enough feedback and buy-in from others to be ready to merge - especially if a 
commit is big. And sometimes my idea is ahead of its time and it has to wait 
for months (even years) to be finally implemented (happened to me!). And I am 
humble enough and persistent enough to continue rebasing my code - because I 
understand others cannot just stop their work while the iterations happen. 
Sometimes rebasing 30-40 times. 
   
   30-40 times seems a big excessive, but I don't want to judge. It seems like 
there should be better organization around at least setting expectations for 
contribution. In my case, I neither knew what to do, I didn't feel empowered to 
do anything, and I didn't understand the architecture well enough or have 
enough experience with the community to know what I was supposed to do.
   
   > And I am trying to not be bitter about it - I try to be empathic about it 
and understand that there are people - not machines on the other side. Actually 
the strategy I use - is to rebase often to avoid one big rebase. It makes it so 
much easier. And rebasing is usually easy - especially that you get conflicts 
ONLY about the files that you touched as well. You do not need to understand 
the whole codebase. I can really recommend tools lik IntelliJ/PyCharm - they 
have fantastic support for solving conflicts and usually it is very easy. And 
only you can do it I am afraid. And it will happen - because other people work 
in parallel. That's just a reality of the project.
   
   Yes, and to this point I'd say touche - there are people on the other side 
that aren't bitter, but need guidance. I've been an open source developer for 
over a decade and I'm well aware about rebasing, but more importantly, people 
and communities feel very differently about it. The sentiment here seems to be 
that it's in favor, but the communication was poor so the final result is a 
mess.
   
   > I think a lot of people would like to see singularity support and after 
initial slow uptick it seems that it's time came. This also means that you will 
get a lot of comments, and you will have to meet the quality bar to get merged 
and sometimes people will have different opinion and you will have to find 
consensus.
   
   Yes, and I would also need guidance from somewhere about what needs to be 
fixed, how to update the PR, otherwise it's just confusing.
   
   I feel like you are taking the defense here, and I certainly didn't do 
anything wrong, so I want to offer a clean slate and suggest that discussion 
stop being focused around who is at fault and why, and how we can move forward 
to fix this. Currently there is a publication out with misleading / incorrect 
information and I'd suggest that effort is put into fixing that. 
   
   I will take the initiative and re-open the PR against the current version, 
and I look forward to better interactions with the _individuals_ in your 
community.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL 

[GitHub] [airflow] ashb commented on issue #7153: [AIRFLOW-5117] Fixed Timestamp bug in RefreshKubeConfigLoader

2020-01-16 Thread GitBox
ashb commented on issue #7153: [AIRFLOW-5117] Fixed Timestamp bug in 
RefreshKubeConfigLoader
URL: https://github.com/apache/airflow/pull/7153#issuecomment-575237800
 
 
   Change looks good now, but as this JIRA was already released we need a new 
JIRA please. (We use the Fix Version to know what to pull back to our releases, 
and as AIRFLOW-5117 is already released we would otherwise miss this change 
when building 1.10.8)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] codecov-io commented on issue #6792: [AIRFLOW-5930] Use cached-SQL query building for hot-path queries

2020-01-16 Thread GitBox
codecov-io commented on issue #6792: [AIRFLOW-5930] Use cached-SQL query 
building for hot-path queries
URL: https://github.com/apache/airflow/pull/6792#issuecomment-575233091
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/6792?src=pr=h1) 
Report
   > Merging 
[#6792](https://codecov.io/gh/apache/airflow/pull/6792?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/59c8a826b8d5d365db68e800cea3de59256530c9?src=pr=desc)
 will **increase** coverage by `0.46%`.
   > The diff coverage is `95.77%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/6792/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/6792?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#6792  +/-   ##
   ==
   + Coverage   84.91%   85.37%   +0.46% 
   ==
 Files 723  723  
 Lines   3954639544   -2 
   ==
   + Hits3358133762 +181 
   + Misses   5965 5782 -183
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/6792?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/api/common/experimental/trigger\_dag.py](https://codecov.io/gh/apache/airflow/pull/6792/diff?src=pr=tree#diff-YWlyZmxvdy9hcGkvY29tbW9uL2V4cGVyaW1lbnRhbC90cmlnZ2VyX2RhZy5weQ==)
 | `97.91% <100%> (-0.05%)` | :arrow_down: |
   | 
[airflow/ti\_deps/deps/dagrun\_exists\_dep.py](https://codecov.io/gh/apache/airflow/pull/6792/diff?src=pr=tree#diff-YWlyZmxvdy90aV9kZXBzL2RlcHMvZGFncnVuX2V4aXN0c19kZXAucHk=)
 | `88.88% <100%> (ø)` | :arrow_up: |
   | 
[airflow/api/common/experimental/mark\_tasks.py](https://codecov.io/gh/apache/airflow/pull/6792/diff?src=pr=tree#diff-YWlyZmxvdy9hcGkvY29tbW9uL2V4cGVyaW1lbnRhbC9tYXJrX3Rhc2tzLnB5)
 | `95.48% <100%> (ø)` | :arrow_up: |
   | 
[airflow/operators/subdag\_operator.py](https://codecov.io/gh/apache/airflow/pull/6792/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvc3ViZGFnX29wZXJhdG9yLnB5)
 | `94.79% <100%> (-0.06%)` | :arrow_down: |
   | 
[airflow/www/views.py](https://codecov.io/gh/apache/airflow/pull/6792/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=)
 | `76.07% <100%> (ø)` | :arrow_up: |
   | 
[airflow/utils/sqlalchemy.py](https://codecov.io/gh/apache/airflow/pull/6792/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zcWxhbGNoZW15LnB5)
 | `96.77% <100%> (+3.44%)` | :arrow_up: |
   | 
[airflow/cli/commands/dag\_command.py](https://codecov.io/gh/apache/airflow/pull/6792/diff?src=pr=tree#diff-YWlyZmxvdy9jbGkvY29tbWFuZHMvZGFnX2NvbW1hbmQucHk=)
 | `86.2% <100%> (ø)` | :arrow_up: |
   | 
[airflow/jobs/backfill\_job.py](https://codecov.io/gh/apache/airflow/pull/6792/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL2JhY2tmaWxsX2pvYi5weQ==)
 | `91.81% <100%> (+1.37%)` | :arrow_up: |
   | 
[airflow/ti\_deps/deps/trigger\_rule\_dep.py](https://codecov.io/gh/apache/airflow/pull/6792/diff?src=pr=tree#diff-YWlyZmxvdy90aV9kZXBzL2RlcHMvdHJpZ2dlcl9ydWxlX2RlcC5weQ==)
 | `91.95% <100%> (+1.16%)` | :arrow_up: |
   | 
[airflow/models/dagrun.py](https://codecov.io/gh/apache/airflow/pull/6792/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFncnVuLnB5)
 | `95.89% <93.02%> (-0.27%)` | :arrow_down: |
   | ... and [17 
more](https://codecov.io/gh/apache/airflow/pull/6792/diff?src=pr=tree-more) 
| |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/6792?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/6792?src=pr=footer). 
Last update 
[59c8a82...eceff22](https://codecov.io/gh/apache/airflow/pull/6792?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] davlum commented on issue #6230: [AIRFLOW-5413] Allow K8S worker pod to be configured from JSON/YAML file

2020-01-16 Thread GitBox
davlum commented on issue #6230: [AIRFLOW-5413] Allow K8S worker pod to be 
configured from JSON/YAML file
URL: https://github.com/apache/airflow/pull/6230#issuecomment-575232284
 
 
   @hmike96 `full_pod_spec` is not yet released. [Here is the source code for 
the KubernetesPodOperator in Airflow 
v1.10.7](https://github.com/apache/airflow/blob/1.10.7/airflow/contrib/operators/kubernetes_pod_operator.py).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] hmike96 commented on issue #6230: [AIRFLOW-5413] Allow K8S worker pod to be configured from JSON/YAML file

2020-01-16 Thread GitBox
hmike96 commented on issue #6230: [AIRFLOW-5413] Allow K8S worker pod to be 
configured from JSON/YAML file
URL: https://github.com/apache/airflow/pull/6230#issuecomment-575229660
 
 
   Is the full_pod_spec feature released yet?? I tried it out but get this bug 
as if it is not released?
   `
   │ 
/usr/local/lib/python3.7/site-packages/airflow/contrib/operators/kubernetes_pod_operator.py:158:
 PendingDeprecationWarning: Invalid arguments were passed to 
KubernetesPodOperator (task_id: fail-ivt-job-notification). Support for passing 
such arguments will be dropped in Airflow 2.0. Invalid arguments were:  
│
   │ *args: ()  

  │
   │ **kwargs: {'full_pod_spec': {'api_version': 'v1',  

  │
   │  'kind': 'Pod',

  │
   │  'metadata': {'annotations': None, 

  │
   │   'cluster_name': None,

  │
   │   'creation_timestamp': None,  

  │
   │   'deletion_grace_period_seconds': None,   

  │
   │   'deletion_timestamp': None,  

  │
   │   'finalizers': None,  

  │
   │   'generate_name': None,   

  │
   │   'generation': None,  

  │
   │   'initializers': None,

  │
   │   'labels': {'datadog': 'alert'},  

  │
   │   'managed_fields': None,  

  │
   │   'name': 'datadogFail',  
   │   'namespace': None,   

  │
   │   'owner_references': None,

  │
   │   'resource_version': None,

  │
   │   'self_link': None,   

  │
   │   'uid': None},

  │
   │  'spec': {'active_deadline_seconds': None, 

  │
   │   'affinity': None,

  │
   │   'automount_service_account_token': None, 

  │
   │   'containers': [{'args': ['FAIL'],

  │
   │   'command': None, 

[GitHub] [airflow] nuclearpinguin merged pull request #7189: [AIRFLOW-XXXX] Move email configuration from the concept page

2020-01-16 Thread GitBox
nuclearpinguin merged pull request #7189: [AIRFLOW-] Move email 
configuration from the concept page
URL: https://github.com/apache/airflow/pull/7189
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] dazza-codes commented on a change in pull request #7027: [WIP][AIRFLOW-6441] KnativeExecutor

2020-01-16 Thread GitBox
dazza-codes commented on a change in pull request #7027: [WIP][AIRFLOW-6441] 
KnativeExecutor
URL: https://github.com/apache/airflow/pull/7027#discussion_r367505849
 
 

 ##
 File path: airflow/cli/commands/knative_worker_command.py
 ##
 @@ -0,0 +1,67 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""knative worker command"""
+
+import signal
+import subprocess
+import sys
+import time
+
+from airflow.utils import cli as cli_utils
+
+
+@cli_utils.action_logging
+def knative_worker(args):
+"""
+Launches knative servers as Gunicorn processes
+@param args:
+"""
+num_workers = args.workers or 8
+# worker_timeout = (args.worker_timeout or
+#   conf.get('webserver', 'web_server_worker_timeout'))
+worker_timeout = 1
+hostname = args.hostname or "0.0.0.0"
+port = args.port or "8081"
+run_args = [
+'gunicorn',
+'-w', str(num_workers),
+'-k', 'sync',
+'-t', str(worker_timeout),
+'-b', str(hostname) + ':' + str(port),
+'-n', 'airflow-worker',
+'-c', 'python:airflow.www.gunicorn_config',
+'airflow.knative_worker.knative_task_runner:create_app()'
+]
+
+def monitor_gunicorn(gunicorn_master_proc):
+while gunicorn_master_proc.poll() is None:
+time.sleep(1)
+sys.exit(gunicorn_master_proc.returncode)
+
+def kill_proc():
+gunicorn_master_proc.terminate()
+gunicorn_master_proc.wait()
+sys.exit(0)
+
+gunicorn_master_proc = subprocess.Popen(run_args, close_fds=True)
+
+signal.signal(signal.SIGINT, kill_proc)
+signal.signal(signal.SIGTERM, kill_proc)
+
+monitor_gunicorn(gunicorn_master_proc)
 
 Review comment:
   If I understand this correctly, it needs to hang on to a process to monitor 
the worker and the monitoring is a blocking operation to poll for the status of 
the worker and cleanup as necessary.  The following suggestion is out of scope 
on this PR - it might be cool if an asyncio event loop could use coroutines to 
monitor the process state and create callbacks to handle cleanup.  Some related 
refs on this topic:
   - https://pymotw.com/3/asyncio/executors.html
 - event loop can manage blocking tasks with threads or process pools
   - 
https://gist.github.com/seglberg/0b4487b57b4fd425c56ad72aba9971be#file-grpc_asyncio-py-L21
 - interesting way to combine asyncio event loop with concurrent.futures 
executor model
   
   Another interesting piece of background is the custom loop at the end of 
this blog post, where it uses a priority queue for scheduling in the event-loop 
(but this is purely academic/educational):
   - https://snarky.ca/how-the-heck-does-async-await-work-in-python-3-5/


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] dazza-codes commented on a change in pull request #7027: [WIP][AIRFLOW-6441] KnativeExecutor

2020-01-16 Thread GitBox
dazza-codes commented on a change in pull request #7027: [WIP][AIRFLOW-6441] 
KnativeExecutor
URL: https://github.com/apache/airflow/pull/7027#discussion_r367507948
 
 

 ##
 File path: airflow/executors/knative_executor.py
 ##
 @@ -0,0 +1,210 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+This module contains KnativeExecutor which is optimized
+for running lots of short tasks in a scalable, efficient manner.
+"""
+
+import asyncio
+import datetime
+import functools
+import multiprocessing
+
+import aiohttp
+from cached_property import cached_property
+
+from airflow.configuration import conf
+from airflow.exceptions import AirflowConfigException, AirflowException
+from airflow.executors.base_executor import BaseExecutor
+from airflow.executors.kubernetes_executor import KubernetesExecutor
+from airflow.utils.log.logging_mixin import LoggingMixin
+from airflow.utils.state import State
+
+
+async def make_request_async(
+task_id,
+dag_id,
+execution_date,
+host,
+log,
+host_header=None,
+) -> aiohttp.ClientResponse:
+"""
+
+Thius function crafts a request to an external airflow server. This server 
is assumed to be
+a knative service.
+@param task_id:
+@param dag_id:
+@param execution_date:
+@param host:
+@param log:
+@param host_header:
+@return:
+"""
+req = "http://; + host + "/run"
+
+date = int(datetime.datetime.timestamp(execution_date))
+params = {
+"task_id": task_id,
+"dag_id": dag_id,
+"execution_date": date,
+}
+headers = {}
+if host_header:
+headers["Host"] = host_header
+timeout = aiohttp.ClientTimeout(total=6)
+async with aiohttp.ClientSession(timeout=timeout) as session:
+async with session.get(url=req, params=params, headers=headers) as 
resp:
+log.info(resp.status)
+log.info(await resp.text())
+return resp
+
+
+class KnativeRequestLoop(multiprocessing.Process, LoggingMixin):
+"""
+
+This class asynchronously pulls tasks from the KnativeExecutor and runs 
them as coroutines using asyncio
+
+"""
 
 Review comment:
   +1 - also consider this pattern at
   - 
https://gist.github.com/seglberg/0b4487b57b4fd425c56ad72aba9971be#file-grpc_asyncio-py-L21


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] codecov-io edited a comment on issue #7128: [AIRFLOW-6529] Pickle error occurs when the scheduler tries to run on macOS.

2020-01-16 Thread GitBox
codecov-io edited a comment on issue #7128: [AIRFLOW-6529] Pickle error occurs 
when the scheduler tries to run on macOS.
URL: https://github.com/apache/airflow/pull/7128#issuecomment-573300577
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7128?src=pr=h1) 
Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@ea5853f`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/7128/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/7128?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#7128   +/-   ##
   =
 Coverage  ?   85.08%   
   =
 Files ?  723   
 Lines ?39564   
 Branches  ?0   
   =
 Hits  ?33663   
 Misses? 5901   
 Partials  ?0
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/7128?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/7128/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==)
 | `88.24% <ø> (ø)` | |
   | 
[airflow/jobs/scheduler\_job.py](https://codecov.io/gh/apache/airflow/pull/7128/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL3NjaGVkdWxlcl9qb2IucHk=)
 | `89.16% <100%> (ø)` | |
   | 
[airflow/plugins\_manager.py](https://codecov.io/gh/apache/airflow/pull/7128/diff?src=pr=tree#diff-YWlyZmxvdy9wbHVnaW5zX21hbmFnZXIucHk=)
 | `90.68% <100%> (ø)` | |
   | 
[airflow/models/dagbag.py](https://codecov.io/gh/apache/airflow/pull/7128/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFnYmFnLnB5)
 | `87% <100%> (ø)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/7128?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/7128?src=pr=footer). 
Last update 
[ea5853f...b612321](https://codecov.io/gh/apache/airflow/pull/7128?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] codecov-io edited a comment on issue #7128: [AIRFLOW-6529] Pickle error occurs when the scheduler tries to run on macOS.

2020-01-16 Thread GitBox
codecov-io edited a comment on issue #7128: [AIRFLOW-6529] Pickle error occurs 
when the scheduler tries to run on macOS.
URL: https://github.com/apache/airflow/pull/7128#issuecomment-573300577
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7128?src=pr=h1) 
Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@ea5853f`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/7128/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/7128?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#7128   +/-   ##
   =
 Coverage  ?   85.08%   
   =
 Files ?  723   
 Lines ?39564   
 Branches  ?0   
   =
 Hits  ?33663   
 Misses? 5901   
 Partials  ?0
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/7128?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/7128/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==)
 | `88.24% <ø> (ø)` | |
   | 
[airflow/jobs/scheduler\_job.py](https://codecov.io/gh/apache/airflow/pull/7128/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL3NjaGVkdWxlcl9qb2IucHk=)
 | `89.16% <100%> (ø)` | |
   | 
[airflow/plugins\_manager.py](https://codecov.io/gh/apache/airflow/pull/7128/diff?src=pr=tree#diff-YWlyZmxvdy9wbHVnaW5zX21hbmFnZXIucHk=)
 | `90.68% <100%> (ø)` | |
   | 
[airflow/models/dagbag.py](https://codecov.io/gh/apache/airflow/pull/7128/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFnYmFnLnB5)
 | `87% <100%> (ø)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/7128?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/7128?src=pr=footer). 
Last update 
[ea5853f...b612321](https://codecov.io/gh/apache/airflow/pull/7128?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] dazza-codes commented on a change in pull request #7027: [WIP][AIRFLOW-6441] KnativeExecutor

2020-01-16 Thread GitBox
dazza-codes commented on a change in pull request #7027: [WIP][AIRFLOW-6441] 
KnativeExecutor
URL: https://github.com/apache/airflow/pull/7027#discussion_r367505849
 
 

 ##
 File path: airflow/cli/commands/knative_worker_command.py
 ##
 @@ -0,0 +1,67 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""knative worker command"""
+
+import signal
+import subprocess
+import sys
+import time
+
+from airflow.utils import cli as cli_utils
+
+
+@cli_utils.action_logging
+def knative_worker(args):
+"""
+Launches knative servers as Gunicorn processes
+@param args:
+"""
+num_workers = args.workers or 8
+# worker_timeout = (args.worker_timeout or
+#   conf.get('webserver', 'web_server_worker_timeout'))
+worker_timeout = 1
+hostname = args.hostname or "0.0.0.0"
+port = args.port or "8081"
+run_args = [
+'gunicorn',
+'-w', str(num_workers),
+'-k', 'sync',
+'-t', str(worker_timeout),
+'-b', str(hostname) + ':' + str(port),
+'-n', 'airflow-worker',
+'-c', 'python:airflow.www.gunicorn_config',
+'airflow.knative_worker.knative_task_runner:create_app()'
+]
+
+def monitor_gunicorn(gunicorn_master_proc):
+while gunicorn_master_proc.poll() is None:
+time.sleep(1)
+sys.exit(gunicorn_master_proc.returncode)
+
+def kill_proc():
+gunicorn_master_proc.terminate()
+gunicorn_master_proc.wait()
+sys.exit(0)
+
+gunicorn_master_proc = subprocess.Popen(run_args, close_fds=True)
+
+signal.signal(signal.SIGINT, kill_proc)
+signal.signal(signal.SIGTERM, kill_proc)
+
+monitor_gunicorn(gunicorn_master_proc)
 
 Review comment:
   If I understand this correctly, it needs to hang on to a process to monitor 
the worker and the monitoring is a blocking operation to poll for the status of 
the worker and cleanup as necessary.  The following suggestion is out of scope 
on this PR - it might be cool if an asyncio event loop could use coroutines to 
monitor the process state and create callbacks to handle cleanup.  Some related 
refs on this topic:
   - https://pymotw.com/3/asyncio/executors.html
 - event loop can manage blocking tasks with threads or process pools
   - 
https://gist.github.com/seglberg/0b4487b57b4fd425c56ad72aba9971be#file-grpc_asyncio-py-L21
 - interesting way to combine asyncio event loop with concurrent.futures 
executor model


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on issue #6792: [AIRFLOW-5930] Use cached-SQL query building for hot-path queries

2020-01-16 Thread GitBox
ashb commented on issue #6792: [AIRFLOW-5930] Use cached-SQL query building for 
hot-path queries
URL: https://github.com/apache/airflow/pull/6792#issuecomment-575220751
 
 
   Updated numbers from master branch. I'm not sure these will equate to _such_ 
a speed up in terms of actual task execution. Overall numbers are lower as I 
have a faster machine to run tests on now :)
   
   
 Concurrent DagRuns | Tasks | Before | After | Speedup
 -- | -- | -- | -- | --  
100 | 12 | 3.9545s (±0.140s) | 1.7174s (±0.045s) | x2.3
100 | 40 | 12.1224s (±0.357s) |9.0649s (±0.603s) | x1.3
1000 | 12 | 17.7790s (±0.240s) | 8.3937s (±0.516s) | x1.5
1000 | 40 | 67.1121s (±0.799s) | 44.6126s (±0.518s) | x1.5


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-6568) Add some more entries (Emacs related files) to .gitignore

2020-01-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017061#comment-17017061
 ] 

ASF subversion and git services commented on AIRFLOW-6568:
--

Commit a611777356c940b4ebfda1b8c29ef7bd214d9e96 in airflow's branch 
refs/heads/master from Kousuke Saruta
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=a611777 ]

[AIRFLOW-6568] Add Emacs related files to .gitignore (#7175)



> Add some more entries (Emacs related files) to .gitignore 
> --
>
> Key: AIRFLOW-6568
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6568
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 1.10.8
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 2.0.0
>
>
> Emacs generates some types of backup files.
> They should be ignored by the repository.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6568) Add some more entries (Emacs related files) to .gitignore

2020-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017060#comment-17017060
 ] 

ASF GitHub Bot commented on AIRFLOW-6568:
-

kaxil commented on pull request #7175: [AIRFLOW-6568] Add some more entries 
(Emacs related files) to .gitignore 
URL: https://github.com/apache/airflow/pull/7175
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add some more entries (Emacs related files) to .gitignore 
> --
>
> Key: AIRFLOW-6568
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6568
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 1.10.8
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> Emacs generates some types of backup files.
> They should be ignored by the repository.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-6568) Add some more entries (Emacs related files) to .gitignore

2020-01-16 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-6568.
-
Fix Version/s: 2.0.0
   Resolution: Fixed

> Add some more entries (Emacs related files) to .gitignore 
> --
>
> Key: AIRFLOW-6568
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6568
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 1.10.8
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 2.0.0
>
>
> Emacs generates some types of backup files.
> They should be ignored by the repository.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] kaxil merged pull request #7175: [AIRFLOW-6568] Add some more entries (Emacs related files) to .gitignore

2020-01-16 Thread GitBox
kaxil merged pull request #7175: [AIRFLOW-6568] Add some more entries (Emacs 
related files) to .gitignore 
URL: https://github.com/apache/airflow/pull/7175
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] nuclearpinguin commented on a change in pull request #7146: [AIRFLOW-6541] Use EmrJobFlowSensor for other states

2020-01-16 Thread GitBox
nuclearpinguin commented on a change in pull request #7146: [AIRFLOW-6541] Use 
EmrJobFlowSensor for other states
URL: https://github.com/apache/airflow/pull/7146#discussion_r367496595
 
 

 ##
 File path: airflow/contrib/sensors/emr_job_flow_sensor.py
 ##
 @@ -16,48 +16,96 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+
 from airflow.contrib.hooks.emr_hook import EmrHook
 from airflow.contrib.sensors.emr_base_sensor import EmrBaseSensor
 from airflow.utils.decorators import apply_defaults
 
 
 class EmrJobFlowSensor(EmrBaseSensor):
 """
-Asks for the state of the JobFlow until it reaches a terminal state.
+Asks for the state of the EMR JobFlow (Cluster) until it reaches
+any of the target states.
 If it fails the sensor errors, failing the task.
 
+With the default target states, sensor waits cluster to be terminated.
+When target_states is set to ['RUNNING', 'WAITING'] sensor waits
+until job flow to be ready (after 'STARTING' and 'BOOTSTRAPPING' states)
+
 :param job_flow_id: job_flow_id to check the state of
 :type job_flow_id: str
+
+:param target_states: the target states, sensor waits until
+job flow reaches any of these states
+:type target_states: list[str]
+
+:param failed_states: the failure states, sensor fails when
+job flow reaches any of these states
+:type failed_states: list[str]
 """
 
-NON_TERMINAL_STATES = ['STARTING', 'BOOTSTRAPPING', 'RUNNING',
-   'WAITING', 'TERMINATING']
-FAILED_STATE = ['TERMINATED_WITH_ERRORS']
-template_fields = ['job_flow_id']
+template_fields = ['job_flow_id', 'target_states', 'failed_states']
 template_ext = ()
 
 @apply_defaults
 def __init__(self,
  job_flow_id,
+ target_states=None,
+ failed_states=None,
  *args,
  **kwargs):
 super().__init__(*args, **kwargs)
 self.job_flow_id = job_flow_id
+if target_states is None:
+target_states = ['TERMINATED']
+self.target_states = target_states
+if failed_states is None:
+failed_states = ['TERMINATED_WITH_ERRORS']
+self.failed_states = failed_states
 
 def get_emr_response(self):
 
 Review comment:
   ```suggestion
   def get_emr_response(self) -> Dict[str, Any]:
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] nuclearpinguin commented on a change in pull request #7146: [AIRFLOW-6541] Use EmrJobFlowSensor for other states

2020-01-16 Thread GitBox
nuclearpinguin commented on a change in pull request #7146: [AIRFLOW-6541] Use 
EmrJobFlowSensor for other states
URL: https://github.com/apache/airflow/pull/7146#discussion_r367495835
 
 

 ##
 File path: airflow/contrib/sensors/emr_job_flow_sensor.py
 ##
 @@ -16,48 +16,96 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+
 from airflow.contrib.hooks.emr_hook import EmrHook
 from airflow.contrib.sensors.emr_base_sensor import EmrBaseSensor
 from airflow.utils.decorators import apply_defaults
 
 
 class EmrJobFlowSensor(EmrBaseSensor):
 """
-Asks for the state of the JobFlow until it reaches a terminal state.
+Asks for the state of the EMR JobFlow (Cluster) until it reaches
+any of the target states.
 If it fails the sensor errors, failing the task.
 
+With the default target states, sensor waits cluster to be terminated.
+When target_states is set to ['RUNNING', 'WAITING'] sensor waits
+until job flow to be ready (after 'STARTING' and 'BOOTSTRAPPING' states)
+
 :param job_flow_id: job_flow_id to check the state of
 :type job_flow_id: str
+
+:param target_states: the target states, sensor waits until
+job flow reaches any of these states
+:type target_states: list[str]
+
+:param failed_states: the failure states, sensor fails when
+job flow reaches any of these states
+:type failed_states: list[str]
 """
 
-NON_TERMINAL_STATES = ['STARTING', 'BOOTSTRAPPING', 'RUNNING',
-   'WAITING', 'TERMINATING']
-FAILED_STATE = ['TERMINATED_WITH_ERRORS']
-template_fields = ['job_flow_id']
+template_fields = ['job_flow_id', 'target_states', 'failed_states']
 template_ext = ()
 
 @apply_defaults
 def __init__(self,
  job_flow_id,
+ target_states=None,
+ failed_states=None,
  *args,
  **kwargs):
 super().__init__(*args, **kwargs)
 self.job_flow_id = job_flow_id
+if target_states is None:
+target_states = ['TERMINATED']
 
 Review comment:
   What about oneliner?
   ```suggestion
   target_states = target_states or ['TERMINATED']
   ```
   
   And shouldn't we extend the list to `['STARTING', 'BOOTSTRAPPING', 
'RUNNING', 'WAITING', 'TERMINATING']` to preserve backward compatibility?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


  1   2   >