[jira] [Commented] (AIRFLOW-5509) Support PATCH method in `DatabricksHook`

2019-12-17 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998500#comment-16998500
 ] 

jack commented on AIRFLOW-5509:
---

[~rosalyntaylor] you can always submit your PRs for improvements. No need to 
ask in advanced

> Support PATCH method in `DatabricksHook`
> 
>
> Key: AIRFLOW-5509
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5509
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Affects Versions: 1.10.5
>Reporter: Rosalyn Taylor
>Assignee: Rosalyn Taylor
>Priority: Major
>
> The current `DatabricksHook` [0] currently only supports GET and POST HTTP 
> operations against the Databricks API: [1]
> {code:python}
> if method == 'GET':
> request_func = requests.get
> elif method == 'POST':
> request_func = requests.post
> else:
> raise AirflowException('Unexpected HTTP Method: ' + method)
> {code}
> Some of the Databricks APIs require PATCH operations. [2] This ticket is to 
> propose that PATCH operation support is added to the `_do_api_call()` method 
> of the `DatabricksHook` class. [3]
>  If this proposal is suitable, I'm happy to submit a PR.
> [0]: 
> [https://github.com/apache/airflow/blob/master//airflow/contrib/hooks/databricks_hook.py#L83:7]
> [1]: 
> [https://github.com/apache/airflow/blob/master//airflow/contrib/hooks/databricks_hook.py#L164-L169]
> [2]: 
> [https://docs.databricks.com/api/latest/scim.html#update-user-by-id-patch]
> [3]: 
> [https://github.com/apache/airflow/blob/master//airflow/contrib/hooks/databricks_hook.py#L136]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6174) Airflow Databricks Hook requires host in extras

2019-12-17 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998498#comment-16998498
 ] 

jack commented on AIRFLOW-6174:
---

This change was introduced in [https://github.com/apache/airflow/pull/5635] by 
[~neilpate...@gmail.com]

 

> Airflow Databricks Hook requires host in extras
> ---
>
> Key: AIRFLOW-6174
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6174
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Affects Versions: 1.10.6
>Reporter: Costas Piliotis
>Priority: Trivial
>
> databricks_hook was changed to require host in the extras in the connection.  
>  
>  
> [https://github.com/apache/airflow/blob/master/airflow/contrib/hooks/databricks_hook.py#L154]
>  
> Maybe first check and see if host is provided in the connection before 
> searching extras?    



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5171) Random task gets stuck in queued state despite all dependencies met

2019-12-17 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998489#comment-16998489
 ] 

jack commented on AIRFLOW-5171:
---

I can confirm this bug also exist in 1.10.3 + Local Executor.
Sadly for us it also seem to be random in nature.
I have yet been able to find common ground not to talk about example to 
reproduce :( 

> Random task gets stuck in queued state despite all dependencies met
> ---
>
> Key: AIRFLOW-5171
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5171
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executors, scheduler
>Affects Versions: 1.10.2
>Reporter: Matt C. Wilson
>Priority: Major
> Attachments: Airflow - Log.png, Airflow - Task Instance Details.htm
>
>
> We are experiencing an issue similar to that reported in AIRFLOW-1641 and 
> AIRFLOW-4586.  We run two parallel dags, both using a common set of pools, 
> both using LocalExecutor.
> What we are seeing is once every couple dozen dag runs, a task will reach the 
> `queued` status and not continue into a `running` state once a pool slot is 
> open / dependencies are filled.
> Investigating the task instance details confirms the same; Airflow reports 
> that it expects the task to commence shortly once resources are available.  
> See attachment. [^Airflow - Task Instance Details.htm]
> While tasks are in this state, the sibling parallel dag is able to flow 
> completely, even multiple times through.  So we know the issue is not with 
> pool constraints, executor issues, etc.  The problem really seems to be that 
> Airflow has simply lost track of the task and failed to start it.
> Clearing the task state has no effect - the task does not get moved back into 
> a `scheduled` or `queued` or `running` state, it just stays at the `none` 
> state.  The task must be marked as `failed` or `success` to resume normal dag 
> flow.
> This issue has been causing sporadic production degradation for us, with no 
> obvious avenue for troubleshooting.  It's not clear if changing the 
> `dagbag_import_timeout` (as reported in 1641) will help because our task has 
> no log showing in the Airflow UI.   See screenshot.   !Airflow - Log.png!
> I'm open to all recommendations to try to get to the bottom of this.  Please 
> let me know if there is any log data or other info I can provide.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5557) Mongo Replica connection Hostname causing a crash

2019-12-17 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998316#comment-16998316
 ] 

jack commented on AIRFLOW-5557:
---

Additional info of replicas hostnames in mongo 
[https://docs.mongodb.com/manual/tutorial/change-hostnames-in-a-replica-set/]

[~zuku1985] care for another? same area of building the correct uri

> Mongo Replica connection Hostname causing a crash 
> --
>
> Key: AIRFLOW-5557
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5557
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: models
>Affects Versions: 1.10.5
>Reporter: SANGAMESH PATIL
>Priority: Major
>
> Hi,
>  
> I tried to provide mongo hostname in replicated format as follows as hitting 
> this error.. 
>  *hello-1:27017,hello-2:27017,hello-3:27017/?replicaSet=MongoReplica*
> Traceback (most recent call last):
> File "/usr/local/bin/airflow", line 32, in 
>    args.func(args)
> File "/usr/local/lib/python3.6/site-packages/airflow/utils/cli.py", line 74, 
> in wrapper
>    return f(*args, **kwargs)
> File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 523, 
> in run
>    _run(args, dag, ti)
> File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 442, 
> in _run
>    pool=args.pool,
> File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 73, 
> in wrapper
>    return func(*args, **kwargs)
> File "/usr/local/lib/python3.6/site-packages/airflow/models/__init__.py", 
> line 1441, in _run_raw_task
>    result = task_copy.execute(context=context)
> File 
> "/usr/local/lib/python3.6/site-packages/airflow/operators/python_operator.py",
>  line 112, in execute
>    return_value = self.execute_callable()
> File 
> "/usr/local/lib/python3.6/site-packages/airflow/operators/python_operator.py",
>  line 117, in execute_callable
>    return self.python_callable(*self.op_args, **self.op_kwargs)
> File "/usr/local/airflow/dags/tasks/hooks/mongo.py", line 166, in __init__
>    self._hook = MongoHook(conn_id="my_db")
> File 
> "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/mongo_hook.py", 
> line 40, in __init__
>    self.connection = self.get_connection(conn_id)
> File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", 
> line 80, in get_connection
>    conn = random.choice(cls.get_connections(conn_id))
> File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", 
> line 71, in get_connections
>    conn = cls._get_connection_from_env(conn_id)
> File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", 
> line 66, in _get_connection_from_env
>    conn = Connection(conn_id=conn_id, uri=environment_uri)
> File "", line 4, in __init__
> File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/state.py", line 
> 428, in _initialize_instance
>    manager.dispatch.init_failure(self, args, kwargs)
> File "/usr/local/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", 
> line 67, in __exit__
>    compat.reraise(exc_type, exc_value, exc_tb)
> File "/usr/local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 
> 277, in reraise
>    raise value
> File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/state.py", line 
> 425, in _initialize_instance
>    return manager.original_init(*mixed[1:], **kwargs)
> File "/usr/local/lib/python3.6/site-packages/airflow/models/connection.py", 
> line 117, in __init__
>    self.parse_from_uri(uri)
> File "/usr/local/lib/python3.6/site-packages/airflow/models/connection.py", 
> line 142, in parse_from_uri
>    self.port = uri_parts.port
> File "/usr/local/lib/python3.6/urllib/parse.py", line 169, in port
>    port = int(port, 10)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3534) KubernetesPodOperator breaks with active log-collection for long running tasks

2019-12-17 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998279#comment-16998279
 ] 

jack commented on AIRFLOW-3534:
---

Could be related to https://issues.apache.org/jira/browse/AIRFLOW-5571

> KubernetesPodOperator breaks with active log-collection for long running tasks
> --
>
> Key: AIRFLOW-3534
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3534
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.10.1
>Reporter: Christian Thiel
>Priority: Major
>  Labels: kubernetes
>
> If a KubernetesPodOperator is started with get_logs=True, the pod breaks if 
> no logs are produced after ~30 seconds due to http client timeout.
> The error occurs in two ways:
> 1. If the script doesn't write anything to stdout there are three WARNINGS of 
> the connectionpool trying to get the logs: 
> {code:python}
> // Some comments here
> 2018-12-17 15:23:15,092] {{logging_mixin.py:95}} WARNING - 2018-12-17 
> 15:23:15,092 WARNING Retrying (Retry(total=2, connect=None, read=None, 
> redirect=None, status=None)) after connection broken by 
> 'ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed 
> connection without response',))': 
> /k8s/clusters/local/api/v1/namespaces/my-namespace/pods/my-pod/log?container=base=True=10
> {code}
> Followed by a {code:python}http.client.RemoteDisconnected: Remote end closed 
> connection without response{code}
> originating from _monitor_pod in /contrib/kubernetes/pod_launcher.py
> Full Traceback:
> {code:python}
> Traceback (most recent call last):
>   File "/opt/conda/lib/python3.6/site-packages/airflow/models.py", line 1659, 
> in _run_raw_task
> result = task_copy.execute(context=context)
>   File 
> "/opt/conda/lib/python3.6/site-packages/airflow/contrib/operators/kubernetes_pod_operator.py",
>  line 123, in execute
> get_logs=self.get_logs)
>   File 
> "/opt/conda/lib/python3.6/site-packages/airflow/contrib/kubernetes/pod_launcher.py",
>  line 90, in run_pod
> return self._monitor_pod(pod, get_logs)
>   File 
> "/opt/conda/lib/python3.6/site-packages/airflow/contrib/kubernetes/pod_launcher.py",
>  line 102, in _monitor_pod
> _preload_content=False)
>   File 
> "/opt/conda/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py",
>  line 18583, in read_namespaced_pod_log
> (data) = self.read_namespaced_pod_log_with_http_info(name, namespace, 
> **kwargs)
>   File 
> "/opt/conda/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py",
>  line 18689, in read_namespaced_pod_log_with_http_info
> collection_formats=collection_formats)
>   File 
> "/opt/conda/lib/python3.6/site-packages/kubernetes/client/api_client.py", 
> line 321, in call_api
> _return_http_data_only, collection_formats, _preload_content, 
> _request_timeout)
>   File 
> "/opt/conda/lib/python3.6/site-packages/kubernetes/client/api_client.py", 
> line 155, in __call_api
> _request_timeout=_request_timeout)
>   File 
> "/opt/conda/lib/python3.6/site-packages/kubernetes/client/api_client.py", 
> line 342, in request
> headers=headers)
>   File "/opt/conda/lib/python3.6/site-packages/kubernetes/client/rest.py", 
> line 231, in GET
> query_params=query_params)
>   File "/opt/conda/lib/python3.6/site-packages/kubernetes/client/rest.py", 
> line 205, in request
> headers=headers)
>   File "/opt/conda/lib/python3.6/site-packages/urllib3/request.py", line 68, 
> in request
> **urlopen_kw)
>   File "/opt/conda/lib/python3.6/site-packages/urllib3/request.py", line 89, 
> in request_encode_url
> return self.urlopen(method, url, **extra_kw)
>   File "/opt/conda/lib/python3.6/site-packages/urllib3/poolmanager.py", line 
> 322, in urlopen
> response = conn.urlopen(method, u.request_uri, **kw)
>   File "/opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py", 
> line 667, in urlopen
> **response_kw)
>   File "/opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py", 
> line 667, in urlopen
> **response_kw)
>   File "/opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py", 
> line 667, in urlopen
> **response_kw)
>   File "/opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py", 
> line 638, in urlopen
> _stacktrace=sys.exc_info()[2])
>   File "/opt/conda/lib/python3.6/site-packages/urllib3/util/retry.py", line 
> 398, in increment
> raise MaxRetryError(_pool, url, error or ResponseError(cause))
> urllib3.exceptions.MaxRetryError: 
> HTTPSConnectionPool(host='rancher.benteler.net', port=443): Max retries 
> exceeded with url: 
> /k8s/clusters/local/api/v1/namespaces/ou-seamless-airflow-ops/pods/sql-fmv-collector-s3-9074ac52/log?container=base=True=10
>  (Caused by 

[jira] [Commented] (AIRFLOW-918) Improve bulk_load function for MySqlHook

2019-12-17 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997977#comment-16997977
 ] 

jack commented on AIRFLOW-918:
--

[~ash] can be closed as duplicate of 
https://issues.apache.org/jira/browse/AIRFLOW-5921

Also this is a child of https://issues.apache.org/jira/browse/AIRFLOW-3886

> Improve bulk_load function for MySqlHook
> 
>
> Key: AIRFLOW-918
> URL: https://issues.apache.org/jira/browse/AIRFLOW-918
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Affects Versions: 1.7.1.3
>Reporter: Ali Uz
>Priority: Minor
>  Labels: easyfix, patch
>
> I think we can improve the `bulk_load` function in MySqlHook by adding a few 
> more parameters. For example, if I want to run a LOAD DATA command like the 
> following:
> ```
> LOAD DATA LOCAL INFILE 'abc.csv' INTO TABLE abc
> FIELDS TERMINATED BY ',' 
> ENCLOSED BY '"' 
> LINES TERMINATED BY '\r\n'
> IGNORE 1 LINES
> ```
> I would expect to supply the delimiter parameters, enclosing quotes 
> parameter, line terminating parameter and ignore line number parameter.
> The current function only applies the following command:
> ```
> LOAD DATA LOCAL INFILE 'abc.csv' INTO TABLE abc
> ```
> It would be great if we could extend it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6218) PapermillOperator has no functional Jinja support

2019-12-16 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997473#comment-16997473
 ] 

jack commented on AIRFLOW-6218:
---

[~bolke] do you know what is the problem?

> PapermillOperator has no functional Jinja support
> -
>
> Key: AIRFLOW-6218
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6218
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 2.0.0
>Reporter: Michiel Ghyselinck
>Priority: Major
> Fix For: 2.0.0
>
>
> {color:#1d1c1d}I think there is something wrong with the PapermillOperator 
> ({color}[https://github.com/apache/airflow/blob/master/airflow/operators/papermill_operator.py]{color:#1d1c1d}).
>  The jinja variables aren't replaced correctly. {color}
> {color:#1d1c1d}For example, if I have a output notebook 
> {color}{{/home/out-\{{ execution_date }}.ipynb}}{color:#1d1c1d} it will be 
> replaced by {color}{{/home/out-\{ execution_date }.ipynb}}{color:#1d1c1d} . I 
> temporarily fixed this by adding {color}{{template_fields = ('input_nb', 
> 'output_nb')}}{color:#1d1c1d} to the PapermillOperator. {color}
> {color:#1d1c1d}And I've changed the url parameter of the NoteBooks, also in 
> the PapermillOperator: 
> {color}{{self.inlets.append(NoteBook(url=self.input_nb, 
> parameters=parameters)) 
> self.outlets.append(NoteBook(url=self.output_nb))}}{color:#1d1c1d} but this 
> is probably not the correct fix... {color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3189) DbAPI get_uri returns invalid uri if schema is None

2019-12-16 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997402#comment-16997402
 ] 

jack commented on AIRFLOW-3189:
---

[~zuku] seems close to another PR you submitted you might be interested in this

> DbAPI get_uri returns invalid uri if schema is None
> ---
>
> Key: AIRFLOW-3189
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3189
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Affects Versions: 1.10.0
>Reporter: Thomas Haederle
>Priority: Minor
>
> the current implementation of get_uri attaches the schema name to the URI 
> even if no schema was specified.
> This leads to errors in downstream functions such as returning an invalid 
> sqlalchemy engine.
> we should add a simple check, such when the schema is None, it should not be 
> appended to the URI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-2940) Code Walkthrough keeping in mind the newbies who have a hard time figuring out where to start reading the code from.

2019-12-16 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997392#comment-16997392
 ] 

jack commented on AIRFLOW-2940:
---

The docs are very extensive now.. I think this is no longer relevant

> Code Walkthrough keeping in mind the newbies who have a hard time figuring 
> out where to start reading the code from.
> 
>
> Key: AIRFLOW-2940
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2940
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: documentation, project-management
>Affects Versions: 1.7.1
> Environment: None
>Reporter: Nidhi Chourasia
>Priority: Minor
>  Labels: documentation
> Fix For: 1.8.0
>
>
> Had gone through the documentation for airflow.It mentions about the 
> implementation of the operators and other feature in offers.For a newbie like 
> me who intend to take a deeper dive into the workings of operators behind the 
> scene and going through the code is somewhat challenging.Any documentation 
> with pointers will help.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5106) POST request are not working of Experimental Rest API

2019-12-16 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997390#comment-16997390
 ] 

jack commented on AIRFLOW-5106:
---

[~Nimesha] can you be more specific?

> POST request are not working of Experimental Rest API
> -
>
> Key: AIRFLOW-5106
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5106
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api
>Affects Versions: 1.10.3
>Reporter: Nimesha Edirisinghe
>Priority: Major
> Fix For: 1.10.3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4525) Trigger Dag Operator causes duplicate key exceptions and can cause runaway dag spawning as it is not atomic at the DB level (on Postgres at least.)

2019-12-11 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993357#comment-16993357
 ] 

jack commented on AIRFLOW-4525:
---

I'm not quite sure I understand why we get orphaned sub dag entries and why you 
claim that now we get more copies of them

> Trigger Dag Operator causes duplicate key exceptions and can cause runaway 
> dag spawning as it is not atomic at the DB level (on Postgres at least.)
> ---
>
> Key: AIRFLOW-4525
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4525
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, DagRun
>Affects Versions: 1.10.3
>Reporter: Tony Brookes
>Priority: Blocker
>
> When using the TriggerDagRunOperator there is a problem in the code which 
> loops round subdags scheduling them.  You will not see this issue if you only 
> have one level of sub dag, but if your sub dags have sub dags then you will 
> see it.
> The code pops an item off the list (non unique) and schedules it.  It then 
> appends all sub dags of the dag it popped off the list to the current list.  
> It keeps doing this until the list is empty.
> The problem is that <>.subdags returns _*all*_ subdags at 
> _*all*_ levels.  So when you process a <> 
> it calls <>.subdags and once agains this will append all 
> its subdags, _*which are already in the list*_.  Thus you are now certain you 
> will get a duplicate key exception as the same dag ID and run ID are present 
> twice.
> Up to and including 1.10.2 this is not a significant problem most of the 
> time.  You see the duplicate key errors in the logs but it does not cause the 
> operator to raise and hence the task actually succeeds.  That said, you do 
> get a load of "running" sub dags in the console which never really do 
> anything as they aren't invoked from the parent dag when it wants them to run 
> and hence have no "task instance" connection to that dag.
> *+However, in 1.10.3 this causes havoc.+*
> Firstly, it no longer exits cleanly.  It causes the operator to raise an 
> error and so it fails.  Worse, since the statements it has executed to 
> schedule is dag are _*not*_ in the same transaction, all the dags before the 
> first duplicate _*are triggered*_.  But since the task will subsequently be 
> retried (if configured) _*they will be triggered again.*_  Because the logic 
> to generate the run ID use now() as part of the key they generate, subsequent 
> invocations will have a different run ID and hence will cause all the dags 
> before the first duplicate exception to be scheduled repeatedly, up to the 
> maximum retry limit.  You still get all the orphaned sub dag entries I 
> mentioned from 10.2, but you get many many copies of them.
> I'm not sure what the best fix is (or if it's my place to suggest one) but 
> from what I've seen the cleanest approach is either to use a set, to avoid 
> duplicate entries, rather than the current list based approach OR continue to 
> use the list with it's "pop" semantics but keep track of items already 
> processed and avoid re-appending them.
> This would fix the current problem, but to be honest it feels semantically 
> *_incorrect_* to trigger the sub dags in this way.  The top level dag invokes 
> the sub dags as task instances like any other and you're going behind its 
> back invoking them this way.  Moreover, the semantic contract of the 
> TriggerDagRunOperator is that it takes a single dag ID as input, implicitly 
> creating the expectation that this is the _*only dag which will be 
> triggered.*_  Scheduling the sub dags as part of doing this feels wrong and 
> actually creates an error whilst doing nothing to help the operation of the 
> platform (unless there is a different configuration set up I am not thinking 
> of which is entirely possible.)
> But as far as I can discern, if you _*only*_ trigger the top level dag you've 
> been _*asked*_ to trigger then actually, everything will work just fine.  The 
> SubDagOperator which wraps the sub dags will trigger the sub dag anyway at 
> the right time, based on whatever dependencies are in the top level dag 
> (which might be none, in which case any sub dags will get scheduled 
> automatically.  The reason I know this of course is that the first time you 
> trigger the top level DAG in the UI, only one row is written to the dag_run 
> table, only the top level dag is triggered, and yet, it works just fine...
> If there is some scenario which should still require the sub dags to be 
> triggered, I think it's important this this sort of operator is atomic (or at 
> the very least idempotent.)  Otherwise you can risk significant issues in a 
> production environment with "over-triggering" Dags.  

[jira] [Commented] (AIRFLOW-4745) failed filtering when accessing DAG Runs page from DAG information page

2019-12-11 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993346#comment-16993346
 ] 

jack commented on AIRFLOW-4745:
---

[~hkak03key] you are welcome to create a PR

> failed filtering when accessing DAG Runs page from DAG information page
> ---
>
> Key: AIRFLOW-4745
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4745
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui, webserver
>Affects Versions: 1.10.3
>Reporter: Hikaru Hoshizaki
>Assignee: Hikaru Hoshizaki
>Priority: Minor
>
> When click "{{schedule: X day, hh:mm:ss}}" on DAG information page such as 
> Tree View, we can jump to DAG Runs page, but not filtering.
> This is easily implement so I'll create this PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3794) Decommission modules in `airflow/contrib/auth/backends/`, other than `password_auth`

2019-12-10 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992514#comment-16992514
 ] 

jack commented on AIRFLOW-3794:
---

 Do we need this? conrtib should be removed following to AIP 21

> Decommission modules in `airflow/contrib/auth/backends/`, other than 
> `password_auth`
> 
>
> Key: AIRFLOW-3794
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3794
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Xiaodong Deng
>Assignee: Xiaodong Deng
>Priority: Major
>
> These modules are only applicable for the Flask-Admin based UI, which is 
> deprecated in master branch.
> `password_auth` will be ignored in this task, as the decision to remove or 
> refactor it is still under discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5597) Linkify urls in task instance log

2019-12-09 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991926#comment-16991926
 ] 

jack commented on AIRFLOW-5597:
---

[~higrys] can this be set to 1.10.7 ? It has no release fix set

> Linkify urls in task instance log
> -
>
> Key: AIRFLOW-5597
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5597
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: webserver
>Affects Versions: 1.10.5
>Reporter: Xiao Zhu
>Assignee: Xiao Zhu
>Priority: Minor
>
> It would be great if urls in task instance log can be linkified thus user can 
> click on the links and open them instead of having to copy and paste.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-2324) View SubDags on Home Page

2019-12-09 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991919#comment-16991919
 ] 

jack commented on AIRFLOW-2324:
---

Why would you want that?
Sub Dags are part of a "greater dag" what value would you have to see them in 
the main list?

> View SubDags on Home Page
> -
>
> Key: AIRFLOW-2324
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2324
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: ui
>Affects Versions: 1.10.0
>Reporter: vishnu srivastava
>Assignee: vishnu srivastava
>Priority: Major
> Fix For: 1.10.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> View SubDag links in the Home page as a collapsible list. This needs to be 
> set via airflow.cfg



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-2432) Templated fields containing password / tokens is displaying in plain text on UI

2019-12-09 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991915#comment-16991915
 ] 

jack commented on AIRFLOW-2432:
---

duplicate of https://issues.apache.org/jira/browse/AIRFLOW-4576

> Templated fields containing password / tokens is displaying in plain text on 
> UI
> ---
>
> Key: AIRFLOW-2432
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2432
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Reporter: John Cheng
>Priority: Major
> Attachments: templated field.PNG
>
>
> I am trying to pass a password to a bash operator with env.
> However env is a templated filed, it will display my password in plaint text 
> on the UI.
> !templated field.PNG!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4264) Would it be possible to make DummyOperator a fairly distinct color then other types of operators?

2019-12-09 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991903#comment-16991903
 ] 

jack commented on AIRFLOW-4264:
---

[~ash] is there something to do here?

> Would it be possible to make DummyOperator a fairly distinct color then other 
> types of operators?
> -
>
> Key: AIRFLOW-4264
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4264
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Reporter: joyce chan
>Priority: Trivial
>
> Sometimes, it's hard to tell at a quick glance on the graph view, unless I've 
> named a task of a dummy operator with a certain name, that it's a task of 
> type DummyOperator, but if it uses another color that would be very helpful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4576) Rendered Template & email_on_failure displays password variable in clear text

2019-12-09 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991879#comment-16991879
 ] 

jack commented on AIRFLOW-4576:
---

If I understand you correctly you expect that encrypted variables will not be 
showed in Render
I wonder if it also shown in the log

> Rendered Template & email_on_failure displays password variable in clear text
> -
>
> Key: AIRFLOW-4576
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4576
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.10.3
> Environment: Linux
>Reporter: Raj Sasidharan
>Priority: Critical
> Attachments: dag_rendered_template.JPG
>
>
> I have a DAG with a SSHOperator, which uses a ssh_conn_id to run the below 
> command. As shown below, I am using Airflow Variables to pass credentials to 
> the script that needs to run.
>  *tac_job_run_command = "\{{ var.value.tac_metaservlet_path 
> }}/MetaServletAirflowCaller.sh --tac-url=http://\{{ var.value.tac_server_ip 
> }}:8080/tac/ --json-params='\{\"authPass\":\"{{ var.value.tac_tadmin_password 
> }}\",\"authUser\":\"tad...@abc.com\",\"taskId\":\{{ 
> ti.xcom_pull(\"get_tac_job_id\")[0] }}}' "*
> The password variable (tac_tadmin_password), in the UI's variables screen 
> shows as * and all works good, but once the job has run, the SSHOperator 
> task's Rendered Template section displays the command with the variable 
> values and also displays the password (tac_tadmin_password) in clear text. Is 
> there any way we can avoid this or is this an issue that needs to be fixed?
> If the DAG fails, I have email_on_failure set to True, and the email also 
> ends up displaying the rendered template with password in clear text.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5458) Flask-AppBuilder shows critical security vulnerability

2019-12-09 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991877#comment-16991877
 ] 

jack commented on AIRFLOW-5458:
---

I think there is open PR for this [https://github.com/apache/airflow/pull/6607] 

> Flask-AppBuilder shows critical security vulnerability
> --
>
> Key: AIRFLOW-5458
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5458
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: dependencies
>Affects Versions: 1.10.5
>Reporter: Souvik Ghosh
>Priority: Major
> Fix For: 1.10.7
>
>
> Hello,
> our security team has detected a vulnerability for Flask-AppBuilder<2.0.0 
> with a CVE 9.8 and recommend us to move the version > 2.0. Since it is in the 
> setup.py of airflow with restrictions. I am wondering if it can be moved to 
> 2.0.0 where no vulnerability is reported.
>  
> Thanks for your help



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3535) Airflow should collect display names, not Firstname / Lastname

2019-12-01 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985507#comment-16985507
 ] 

jack commented on AIRFLOW-3535:
---

Duplicate of  https://issues.apache.org/jira/browse/AIRFLOW-3442

> Airflow should collect display names, not Firstname / Lastname
> --
>
> Key: AIRFLOW-3535
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3535
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: database, ui
>Affects Versions: 1.10.1
>Reporter: James Meickle
>Priority: Minor
>
> We use Google OAuth to provision our Airflow accounts. This creates "user 
> names" of "google_12345", with the corresponding email address. The first and 
> last name of the user are pulled into the corresponding Airflow fields.
> In general, though, First Name / Last Name is not considered a good pattern 
> for user systems unless they are actually critical to handle business logic. 
> Further reading on problems that can cause here: 
> https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
> We should condense these fields into a totally freeform "Display Name", and 
> use that more consistently in the UI. For example, in AIRFLOW-3442, an 
> internal username is displayed rather than a display name. (In the case of an 
> audit log, the right value is probably: `Display Name (internal_name)`.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4470) RBAC Github Enterprise OAuth provider callback URL?

2019-12-01 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985506#comment-16985506
 ] 

jack commented on AIRFLOW-4470:
---

[~Nidhi94_] 
So this is a bug or documentation issue?

> RBAC Github Enterprise OAuth provider callback URL?
> ---
>
> Key: AIRFLOW-4470
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4470
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication, webserver
>Affects Versions: 1.10.2
>Reporter: Geez
>Priority: Blocker
>  Labels: usability
> Attachments: airflow_ss0_2.PNG, image-2019-10-30-16-25-14-436.png, 
> image-2019-10-31-11-47-04-041.png
>
>
> Hi all,
> Quick question, when using RBAC with OAuth providers (1.10.2):
>  * we are not specifying the {{authenticate}} or {{auth_backend}} in the 
> [webserver] section of \{{airflow.cfg}}anymore
>  * Instead, we set the OAuth provider config in the flask-appbuilder's 
> {{webserver_config.py}}:
> {code:java}
>  
> # Adapting Google OAuth example to Github:
> OAUTH_PROVIDERS = [
> {'name':'github', 'icon':'fa-github', 'token_key':'access_token',
>  'remote_app': {
> 'base_url':'https://github.corporate-domain.com/login',
> 
> 'access_token_url':'https://github.corporate-domain.com/login/oauth/access_token',
> 
> 'authorize_url':'https://github.corporate-domain.com/login/oauth/authorize',
> 'request_token_url': None,
> 'consumer_key': '',
> 'consumer_secret': 'X',
>  }
> }
> ]
>  
> {code}
>  _Question:_
>  * so what callback URL do we specify in the app? 
> {{http:/webapp/ghe_oauth/callback}} would not work right? (example with 
> github entreprise)
> No matter what I specify for the callback url (/ghe_oauth/callback or 
> [http://webapp.com|http://webapp.com/]), I get an error message about 
> {{redirect_uri}} mismatch:
> {code:java}
> {{error=redirect_uri_mismatch_description=The+redirect_uri+MUST+match+the+registered+callback+URL+for+this+application
>  }}{code}
> _Docs ref:_
>  Here is how you setup OAuth with Github Entreprise on Airflow _*without*_ 
> RBAC: 
> [https://airflow.apache.org/security.html#github-enterprise-ghe-authentication]
> And here is how you setup OAuth via the {{webserver_config.py}} of 
> flask_appbuilder used by airflow _*with*_RBAC:
>  
> [https://flask-appbuilder.readthedocs.io/en/latest/security.html#authentication-oauth]
> What's the *callback url* when using RBAC and OAuth with Airflow?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3329) Initdb suggests wrong command for kubernetes module installation.

2019-12-01 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985499#comment-16985499
 ] 

jack commented on AIRFLOW-3329:
---

Is this still an issue?

> Initdb suggests wrong command for kubernetes module installation.
> -
>
> Key: AIRFLOW-3329
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3329
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: utils
>Affects Versions: 1.10.0
>Reporter: Andriy Sachko
>Priority: Minor
>
> After installing with the command:
> {{pip install apache-airflow[s3]}}
> and running:
> {{ariflow initdb}}
> suggests wrong command for kubernetes module installation:
> {{WARNI [airflow.utils.log.logging_mixin.LoggingMixin] Could not import 
> KubernetesPodOperator: No module named 'kubernetes'}}
>  {{WARNI [airflow.utils.log.logging_mixin.LoggingMixin] Install kubernetes 
> dependencies with: pip install airflow['kubernetes']}}
> {{Should be pip install apache-airflow['kubernetes']}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-2906) DataDog Integration for Airflow

2019-12-01 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985498#comment-16985498
 ] 

jack commented on AIRFLOW-2906:
---

[~cckavar] do you have something ready?

> DataDog Integration for Airflow
> ---
>
> Key: AIRFLOW-2906
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2906
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: utils
>Affects Versions: 1.8.0
>Reporter: Austin Hsu
>Assignee: Chandu Kavar
>Priority: Minor
>  Labels: metrics
>
> Add functionality to Airflow to enable sending of metrics to DataDog.  
> DataDog provides support for tags which allows us to aggregate data more 
> easily and visualize it.  We can utilize the [Datadog python 
> library|https://github.com/DataDog/datadogpy] python library and the [Datadog 
> ThreadStats 
> module|https://datadogpy.readthedocs.io/en/latest/#datadog-threadstats-module]
>  to send metrics directly to DataDog without needing to spin up an agent to 
> forward the metrics.  The current implementation in 1.8 uses the statsd 
> library to send the metrics which provides us with much less control to 
> filter our data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3656) Airflow Web UI link to the docs should be dynamic to Airflow version

2019-11-28 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984429#comment-16984429
 ] 

jack commented on AIRFLOW-3656:
---

I think this can be revisited now that the new website is on

> Airflow Web UI link to the docs should be dynamic to Airflow version
> 
>
> Key: AIRFLOW-3656
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3656
> Project: Apache Airflow
>  Issue Type: Task
>  Components: ui
>Affects Versions: 1.10.1
>Reporter: jack
>Priority: Major
> Attachments: 0101.PNG
>
>
> Currently in the UI Docs->Documentation directs to 
> [https://airflow.incubator.apache.org/]
>  # This should be changed to [https://airflow.readthedocs.io/en/stable/]  
> because in readthedocs the user can actually select the docs version. while 
> the current one refers only to Master branch and the user can't change it nor 
> would even know it.
>  # Preferably by clicking on the Docs->Documentation it will pick up the 
> Airflow version and point directly to the user Airflow version. Meaning that 
> if the user runs Airflow 1.10.0 it will point to 
> [https://airflow.readthedocs.io/en/1.10.0/]  The airflow version is already 
> transparent in the UI (About->Version) so it shouldn't be difficult to build 
> this link.
>  
> Previous PR that was related to the doc link is (by [~kaxilnaik]):
> https://github.com/apache/airflow/pull/3050



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-625) doc_md in concepts document seems wrong

2019-11-28 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984427#comment-16984427
 ] 

jack commented on AIRFLOW-625:
--

[~ash] can be closed?

> doc_md in concepts document seems wrong
> ---
>
> Key: AIRFLOW-625
> URL: https://issues.apache.org/jira/browse/AIRFLOW-625
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 1.7.1
> Environment: CentOS 7.2 
>Reporter: Flex Gao
>Priority: Major
>
> In 
> [https://github.com/apache/incubator-airflow/blob/master/docs/concepts.rst] 
> it said *doc_md* is an attribute of a task, but this will give an error on 
> webserver *Graph* tab
> Example Dag file:
> {code}
> #!/usr/bin/env python
> # -*- coding: utf-8 -*-
> from airflow import DAG
> from airflow.operators.bash_operator import BashOperator
> from datetime import datetime, timedelta
> SCRIPTS_PATH = '/var/lib/airflow/scripts'
> default_args = {
> 'depends_on_past': False,
> 'start_date': datetime(2016, 11, 9, 0, 55),
> }
> dag = DAG('elasticsearch', default_args=default_args, 
> schedule_interval=timedelta(days=1))
> t1 = BashOperator(
> task_id='daily_index_delete',
> bash_command='%s/es_clean.py' % SCRIPTS_PATH,
> dag=dag)
> t1.doc_md = """\
>  Task Documentation
> Clean ES Indeices Every Day
> """
> {code}
> This will give a traceback: 
> *AttributeError: 'NoneType' object has no attribute 'strip'*
> But if i changed *t1.doc_md* to *dag.doc_md*, everything is ok.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6086) SparkSubmitOperator - Unable to override spark_binary

2019-11-27 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983624#comment-16983624
 ] 

jack commented on AIRFLOW-6086:
---

duplicate of https://issues.apache.org/jira/browse/AIRFLOW-5517

> SparkSubmitOperator - Unable to override spark_binary 
> --
>
> Key: AIRFLOW-6086
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6086
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, core
>Affects Versions: 1.10.6
>Reporter: Florian FERREIRA
>Priority: Major
>
> Hello,
> I have a connection "spark2_default" : 
> || Conn Id || Conn Type ||  Host || Port || Is Encrypted || Is Extra 
> Encrypted || Extra  
> | 'spark2_default' | 'spark2'  | 'yarn-cluster' | None | False | False | 
> {"master":"yarn-cluster","deploy-mode":"cluster","spark-binary":"spark2-submit"}
>  |
> Extra contains 'spark-binary' key that was use by airflow 1.10.2 to choose 
> spark-submit operator. But in version 1.10.6 this config is ignore.
> I think that , in class SparkSubmitOperator in init function they has a 
> default value "spark-submit" for spark_binary parameter. 
> {code}
>  spark_binary="spark-submit",
> {code}
> Therefore in class SparkSubmitHook when we control if spark_binary is empty 
> it can't be.
> {code}
> conn_data['spark_binary'] = self._spark_binary or  \
> extra.get('spark-binary', "spark-submit")
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3407) BaseOperator and LoggingMixin do not call super().__init__

2019-11-27 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983555#comment-16983555
 ] 

jack commented on AIRFLOW-3407:
---

also LoggingMixin doesn't inherit from other class so maybe need to edit the 
content of the Jira 

[https://github.com/apache/airflow/blob/master/airflow/utils/log/logging_mixin.py#L36]

> BaseOperator and LoggingMixin do not call super().__init__
> --
>
> Key: AIRFLOW-3407
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3407
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.10.1
>Reporter: adam hitchcock
>Assignee: Chao-Han Tsai
>Priority: Major
>
> The {{BaseOperator}} is not necessarily the last class in the MRO; usually it 
> is best practice to always call {{super().__init__(*args, **kwargs)}}
>  to make sure that every class gets it chance to {{__init__}}.
> Is there a specific reason {{BaseOperator}} doesn't call super?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-2011) Airflow ampq pool maintains dead connections

2019-11-27 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983467#comment-16983467
 ] 

jack commented on AIRFLOW-2011:
---

The default value of broker_pool_limit in Celery is 10 

[https://docs.celeryproject.org/en/latest/userguide/configuration.html]

Wonder where it's get over written to None?

> Airflow ampq pool maintains dead connections
> 
>
> Key: AIRFLOW-2011
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2011
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery, scheduler
>Affects Versions: 1.10.0
> Environment: OS: Ubuntu 16.04 LTS (debian)
> Python: 3.6.3
> Airflow: 1.9.1rc1
>Reporter: Kevin Reilly
>Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Airflow scheduler deadlocks on queue-up for tasks
> [2018-01-08 07:01:09,315] \{{celery_executor.py:101}} ERROR - Error syncing 
> the celery executor, ignoring it:
> [2018-01-08 07:01:09,315] \{{celery_executor.py:102}} ERROR - [Errno 104] 
> Connection reset by peer
> Traceback (most recent call last):
> File 
> "/usr/local/lib/python3.6/dist-packages/airflow/executors/celery_executor.py",
>  line 83, in
> state = async.state
> File "/usr/local/lib/python3.6/dist-packages/celery/result.py", line 436, in 
> state
> return self._get_task_meta()['status']
> File "/usr/local/lib/python3.6/dist-packages/celery/result.py", line 375, in 
> _get_task_meta
> return self._maybe_set_cache(self.backend.get_task_meta(self.id))
> File "/usr/local/lib/python3.6/dist-packages/celery/backends/rpc.py", line 
> 244, in get_task_meta
> for acc in self._slurp_from_queue(task_id, self.accept, backlog_limit):
> File "/usr/local/lib/python3.6/dist-packages/celery/backends/rpc.py", line 
> 278, in
> binding.declare()
> File "/usr/local/lib/python3.6/dist-packages/kombu/entity.py", line 605, in 
> declare
> self._create_queue(nowait=nowait, channel=channel)
> File "/usr/local/lib/python3.6/dist-packages/kombu/entity.py", line 614, in 
> _create_queue
> self.queue_declare(nowait=nowait, passive=False, channel=channel)
> File "/usr/local/lib/python3.6/dist-packages/kombu/entity.py", line 649, in 
> queue_declare
> nowait=nowait,
> File "/usr/local/lib/python3.6/dist-packages/amqp/channel.py", line 1147, in 
> queue_declare
> nowait, arguments),
> File "/usr/local/lib/python3.6/dist-packages/amqp/abstract_channel.py", line 
> 50, in send_method
> conn.frame_writer(1, self.channel_id, sig, args, content)
> File "/usr/local/lib/python3.6/dist-packages/amqp/method_framing.py", line 
> 166, in write_frame
> write(view[:offset])
> File "/usr/local/lib/python3.6/dist-packages/amqp/transport.py", line 258, in 
> write
> self._write(s)
> ConnectionResetError: [Errno 104] Connection reset by peer
> If I edit the celery settings file and add an argument to set
> broker_pool_limit=None
> editing default_celery.py
> and adding
> "broker_pool_limit":None,
> between lines 37 and 38 would solve the issue.  This particular setting 
> requires celery to create a new ampq connection each time it needs one, 
> thereby preventing the rabbitmq server from disconnecting the connection 
> where the client is unaware and leaving broken sockets open for use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (AIRFLOW-2327) Cannot pickle PythonOperator dags

2019-11-27 Thread jack (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jack updated AIRFLOW-2327:
--
Comment: was deleted

(was: Mesos Executor is deprecated from Master so it's unlikely this will be 
ever fixed.)

> Cannot pickle PythonOperator dags 
> --
>
> Key: AIRFLOW-2327
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2327
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Affects Versions: 1.9.0
> Environment: prod
>Reporter: niraja b
>Priority: Major
>
> We are using the MesosExecutor of Airflow 
>  
> BashOperator and SimpleHTTPOperator works for us 
> The Scheduler is started using -p to pickle the DAGS.
>  
>  
> The issue we have is with the following sample Code , we tried adding 
> use_dill without use_dill with PythonOperator and with 
> PythonVirtualenvOperator.. we couldnt get it sucessfully working on the agent 
>  
> from __future__ import print_function
> from airflow.models import DAG
> from datetime import timedelta, datetime
> from airflow.operators.python_operator import 
> PythonOperator,PythonVirtualenvOperator
> DAG_ID = "testdag"
> DEFAULT_ARGS = {
>  "start_date": datetime(2018, 4, 16, 1, 50, 16),
>  "schedule_interval": None,
>  "dagrun_timeout": timedelta(minutes=60),
>  "email": ['t...@test.com'],
>  "email_on_failure": True,
>  "email_on_retry": False,
>  "retries": 3,
>  "retry_delay": timedelta(seconds=5),
> }
> def _testlambda(**kwargs):
>  print("hello world")
> with DAG(dag_id=DAG_ID, default_args=DEFAULT_ARGS) as dag:
>  (
>  PythonVirtualenvOperator(
>  task_id='python_1',
>  python_callable=_testlambda, 
>  use_dill=True,
>  requirements=['dill']
>  )
>  )
>  
> Error 
>  
> Traceback (most recent call last):
>   File "/usr/bin/airflow", line 27, in 
>     args.func(args)
>   File "/usr/lib/python2.7/site-packages/airflow/bin/cli.py", line 358, in run
>     DagPickle).filter(DagPickle.id == args.pickle).first()
>   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 
> 2789, in first
>     ret = list(self[0:1])
>   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 
> 2581, in __getitem__
>     return list(res)
>   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/loading.py", line 
> 137, in instances
>     util.raise_from_cause(err)
>   File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 
> 203, in raise_from_cause
>     reraise(type(exception), exception, tb=exc_tb, cause=cause)
>   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/loading.py", line 
> 102, in instances
>     logging.debug(str(fetch[0]))
>   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/result.py", line 
> 156, in __repr__
>     return repr(sql_util._repr_row(self))
>   File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/util.py", line 329, 
> in __repr__
>     ", ".join(trunc(value) for value in self.row),
>   File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/sqltypes.py", line 
> 1588, in process
>     return loads(value)
>   File "/usr/lib/python2.7/site-packages/dill/dill.py", line 299, in loads
>     return load(file)
>   File "/usr/lib/python2.7/site-packages/dill/dill.py", line 288, in load
>     obj = pik.load()
>   File "/usr/lib64/python2.7/pickle.py", line 858, in load
>     dispatch[key](self)
>   File "/usr/lib64/python2.7/pickle.py", line 1090, in load_global
>     klass = self.find_class(module, name)
>   File "/usr/lib/python2.7/site-packages/dill/dill.py", line 445, in 
> find_class
>     return StockUnpickler.find_class(self, module, name)
>   File "/usr/lib64/python2.7/pickle.py", line 1124, in find_class
>     __import__(module)
> ImportError: No module named 
> unusual_prefix_ac646764c974ff68b827793414d8eabcdca720cf_dmitrydag
> I0416 11:22:34.367975 47476 executor.cpp:938] Command exited with status 1 
> (pid: 47482)
> I0416 11:22:35.371712 47481 process.cpp:887] Failed to accept socket: future 
> discarded
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4470) RBAC Github Enterprise OAuth provider callback URL?

2019-11-27 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983383#comment-16983383
 ] 

jack commented on AIRFLOW-4470:
---

same issue as https://issues.apache.org/jira/browse/AIRFLOW-2992 ?

> RBAC Github Enterprise OAuth provider callback URL?
> ---
>
> Key: AIRFLOW-4470
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4470
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication, webserver
>Affects Versions: 1.10.2
>Reporter: Geez
>Priority: Blocker
>  Labels: usability
> Attachments: airflow_ss0_2.PNG, image-2019-10-30-16-25-14-436.png, 
> image-2019-10-31-11-47-04-041.png
>
>
> Hi all,
> Quick question, when using RBAC with OAuth providers (1.10.2):
>  * we are not specifying the {{authenticate}} or {{auth_backend}} in the 
> [webserver] section of \{{airflow.cfg}}anymore
>  * Instead, we set the OAuth provider config in the flask-appbuilder's 
> {{webserver_config.py}}:
> {code:java}
>  
> # Adapting Google OAuth example to Github:
> OAUTH_PROVIDERS = [
> {'name':'github', 'icon':'fa-github', 'token_key':'access_token',
>  'remote_app': {
> 'base_url':'https://github.corporate-domain.com/login',
> 
> 'access_token_url':'https://github.corporate-domain.com/login/oauth/access_token',
> 
> 'authorize_url':'https://github.corporate-domain.com/login/oauth/authorize',
> 'request_token_url': None,
> 'consumer_key': '',
> 'consumer_secret': 'X',
>  }
> }
> ]
>  
> {code}
>  _Question:_
>  * so what callback URL do we specify in the app? 
> {{http:/webapp/ghe_oauth/callback}} would not work right? (example with 
> github entreprise)
> No matter what I specify for the callback url (/ghe_oauth/callback or 
> [http://webapp.com|http://webapp.com/]), I get an error message about 
> {{redirect_uri}} mismatch:
> {code:java}
> {{error=redirect_uri_mismatch_description=The+redirect_uri+MUST+match+the+registered+callback+URL+for+this+application
>  }}{code}
> _Docs ref:_
>  Here is how you setup OAuth with Github Entreprise on Airflow _*without*_ 
> RBAC: 
> [https://airflow.apache.org/security.html#github-enterprise-ghe-authentication]
> And here is how you setup OAuth via the {{webserver_config.py}} of 
> flask_appbuilder used by airflow _*with*_RBAC:
>  
> [https://flask-appbuilder.readthedocs.io/en/latest/security.html#authentication-oauth]
> What's the *callback url* when using RBAC and OAuth with Airflow?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3407) BaseOperator and LoggingMixin do not call super().__init__

2019-11-27 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983374#comment-16983374
 ] 

jack commented on AIRFLOW-3407:
---

This was fixed in another PR

[https://github.com/apache/airflow/blob/master/airflow/models/baseoperator.py#L322]

> BaseOperator and LoggingMixin do not call super().__init__
> --
>
> Key: AIRFLOW-3407
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3407
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.10.1
>Reporter: adam hitchcock
>Assignee: Chao-Han Tsai
>Priority: Major
>
> The {{BaseOperator}} is not necessarily the last class in the MRO; usually it 
> is best practice to always call {{super().__init__(*args, **kwargs)}}
>  to make sure that every class gets it chance to {{__init__}}.
> Is there a specific reason {{BaseOperator}} doesn't call super?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-2433) Operator with TriggerRule "one_success" with multiple upstream tasks is marked as Skipped instead of UpstreamFailed if all its upstream tasks are in "UpstreamFailed"

2019-11-27 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983368#comment-16983368
 ] 

jack commented on AIRFLOW-2433:
---

[~vigneshwaran] is this still an issue?

> Operator with TriggerRule "one_success" with multiple upstream tasks is 
> marked as Skipped instead of UpstreamFailed if all its upstream tasks are in 
> "UpstreamFailed" status
> 
>
> Key: AIRFLOW-2433
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2433
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.9.0
>Reporter: Vigneshwaran Raveendran
>Priority: Major
> Attachments: Airflow_1.9_incorrect_state_issue.png
>
>
> I have a task with trigger_rule "one_success" with two upstream tasks.
> When all its upstream tasks are in UpstreamFailed, the task and all its 
> downstream tasks are marked as "Skipped" instead of expected "UpstreamFailed".
> Since the root tasks end up in Skipped status and not in UpstreamFailed, the 
> DAG is marked as Success instead of the expected Failed status.
> Please see the attachment for reference. The "step 8" is the task with 
> trigger rule "one_success".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5920) Add support to execute OpenCypher query against Neo4j

2019-11-27 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983312#comment-16983312
 ] 

jack commented on AIRFLOW-5920:
---

[~tfindlay] if the operator is functional and working it doesn't need to be 
draft version.

draft is for testing something that may not work until it's stable.

So if the functionality you wish to add is working on your local environment 
and it's just need peer review to get it to Airflow standard you can remove the 
"Draft"  from the title

> Add support to execute OpenCypher query against Neo4j
> -
>
> Key: AIRFLOW-5920
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5920
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: hooks, operators
>Affects Versions: 1.10.7
>Reporter: Timothy Findlay
>Assignee: Timothy Findlay
>Priority: Minor
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> As a DAG developer
> I want to create DAG tasks to execute OpenCypher queries against a graph 
> database
> So that the output can be used elsewhere in a DAG / business



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6070) On the admin dashboard, the recent tasks column has no tooltip for tasks in 'null' state

2019-11-26 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982535#comment-16982535
 ] 

jack commented on AIRFLOW-6070:
---

Null / None always need special treatment 
Even though different issues I think this probably has the same roots as 
https://issues.apache.org/jira/browse/AIRFLOW-4314

> On the admin dashboard, the recent tasks column has no tooltip for tasks in 
> 'null' state
> 
>
> Key: AIRFLOW-6070
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6070
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.10.2
> Environment: GCP Composer composer-1.7.5-airflow-1.10.2
>Reporter: Adam Hopkinson
>Priority: Trivial
> Attachments: image-2019-11-26-10-03-40-377.png
>
>
> On the DAGS listing template, the circles in the _Recent Tasks_ column all 
> have a tooltip apart from the second to last - which is for tasks with state 
> = `null`
> !image-2019-11-26-10-03-40-377.png|width=261,height=37!
> I believe this is happening in [this line of 
> code|https://github.com/apache/airflow/blob/0ff9e2307042ba95e69b32e37f2fc767a5fdc36d/airflow/www/templates/airflow/dags.html#L447],
>  which is:
> {{.attr('title', function(d) \{return d.state || 'none'})}}
> I'm not sure why it's not falling back to 'none' - I think it's possibly 
> seeing the value of d.state as the text value 'null' rather than a true null, 
> but then putting that into the title as true null.
> I'm using GCP Composer, so don't have a local instance of Airflow that I can 
> test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4742) Change ui icon of failed dag runs to a darker shade of red

2019-11-26 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982518#comment-16982518
 ] 

jack commented on AIRFLOW-4742:
---

[~jason_ma] you can PR your change to airflow repo and get comments there. No 
need to do that in your own fork.

> Change ui icon of failed dag runs to a darker shade of red
> --
>
> Key: AIRFLOW-4742
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4742
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Affects Versions: 1.10.3
>Reporter: Jason Ma
>Priority: Major
> Attachments: screenshot-1.png
>
>
> Hi, due to my colorblindness, I've noticed that it's pretty hard to separate 
> the red and green icons on the dag run. Would it be at all possible to change 
> the shade of red to a darker shade, such as the one provided above?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-1717) AttributeError while clicking on dag on webUI

2019-11-21 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979495#comment-16979495
 ] 

jack commented on AIRFLOW-1717:
---

Had many releases since 1.8 clicking on DAG in the Ui works now for sure :)

> AttributeError while clicking on dag on webUI
> -
>
> Key: AIRFLOW-1717
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1717
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.8.0
>Reporter: Ambrish Bhargava
>Priority: Major
>
> Simple DAG
> {code}from airflow import DAG
> from airflow.contrib.operators.qubole_operator import QuboleOperator
> from datetime import datetime, timedelta
>  
> # Default args
> default_args = {
> 'owner': 'airflow',
> 'depends_on_past': False,
> 'start_date': datetime(2017, 8, 1),
> 'email': ['airf...@airflow.com'],
> 'email_on_failure': True,
> 'email_on_retry': False,
> 'retries': 1,
> 'retry_delay': timedelta(minutes=5),
> }
>  
> # Dag information
> dag = DAG(
> 'qubole_test',
> default_args=default_args,
> schedule_interval='@daily')
>  
> # Actual steps
> hive_cmd = QuboleOperator(
> command_type='hivecmd',
> task_id='qubole_show_tables',
> query='use schema;show tables;',
> cluster_label='default',
> qubole_conn_id = 'airflow_qubole',
> dag=dag){code}
> When I ran this dag on CLI, it worked fine. But when I tried to click the DAG 
> on web UI, I am getting following error:
> {code}Traceback (most recent call last):
>   File "/usr/local/lib64/python2.7/site-packages/flask/app.py", line 1988, in 
> wsgi_app
> response = self.full_dispatch_request()
>   File "/usr/local/lib64/python2.7/site-packages/flask/app.py", line 1641, in 
> full_dispatch_request
> rv = self.handle_user_exception(e)
>   File "/usr/local/lib64/python2.7/site-packages/flask/app.py", line 1544, in 
> handle_user_exception
> reraise(exc_type, exc_value, tb)
>   File "/usr/local/lib64/python2.7/site-packages/flask/app.py", line 1639, in 
> full_dispatch_request
> rv = self.dispatch_request()
>   File "/usr/local/lib64/python2.7/site-packages/flask/app.py", line 1625, in 
> dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
>   File "/usr/local/lib/python2.7/site-packages/flask_admin/base.py", line 69, 
> in inner
> return self._run_view(f, *args, **kwargs)
>   File "/usr/local/lib/python2.7/site-packages/flask_admin/base.py", line 
> 368, in _run_view
> return fn(self, *args, **kwargs)
>   File "/usr/local/lib/python2.7/site-packages/flask_login.py", line 755, in 
> decorated_view
> return func(*args, **kwargs)
>   File "/usr/local/lib/python2.7/site-packages/airflow/www/utils.py", line 
> 219, in view_func
> return f(*args, **kwargs)
>   File "/usr/local/lib/python2.7/site-packages/airflow/www/utils.py", line 
> 125, in wrapper
> return f(*args, **kwargs)
>   File "/usr/local/lib/python2.7/site-packages/airflow/www/views.py", line 
> 1229, in tree
> 'children': [recurse_nodes(t, set()) for t in dag.roots],
>   File "/usr/local/lib/python2.7/site-packages/airflow/www/views.py", line 
> 1191, in recurse_nodes
> if node_count[0] < node_limit or t not in visited]
>   File "/usr/local/lib/python2.7/site-packages/airflow/www/views.py", line 
> 1216, in recurse_nodes
> for d in dates],
> AttributeError: 'NoneType' object has no attribute 'isoformat'{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3185) Add chunking to DBAPI_hook by implementing fetchmany and pandas chunksize

2019-11-21 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979489#comment-16979489
 ] 

jack commented on AIRFLOW-3185:
---

[~tomanizer] do you have a final version to PR?

> Add chunking to DBAPI_hook by implementing fetchmany and pandas chunksize
> -
>
> Key: AIRFLOW-3185
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3185
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Affects Versions: 1.10.0
>Reporter: Thomas Haederle
>Assignee: Thomas Haederle
>Priority: Minor
>  Labels: easyfix
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> DbApiHook currently implements get_records and get_pandas_df, where both 
> methods fetch all records into memory.
> We should implement two new methods which return a generator with a 
> configurable chunksize:
> - def get_many_records(self, sql, parameters=None, chunksize=20, 
> iterate_singles=False):
> - def get_pandas_df_chunks(self, sql, parameters=None, chunksize=20)
> this should work for all DB hooks which inherit from this class.
> We could also adapt existing methods, but that could be problematic because 
> these methods will return a generator whereas the others return either 
> records or dataframes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4938) next_execution_date is not a Pendulum object

2019-11-21 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979371#comment-16979371
 ] 

jack commented on AIRFLOW-4938:
---

Was fixed in 1.10.4:

https://issues.apache.org/jira/browse/AIRFLOW-4788

> next_execution_date is not a Pendulum object
> 
>
> Key: AIRFLOW-4938
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4938
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: models
>Affects Versions: 1.10.2
> Environment: Airlfow 1.10.2 in Docker using puckel docker image.
> Configured for timezone Copenhagen
>Reporter: Adam Andersen Læssøe
>Priority: Major
>
> When templating, it seems `execution_date` refers to a Pendulum object while 
> `next_execution_date` refers to a native `datetime` object.
>  This is inconsistent, and contrary to what the docs say, so I'm thinking 
> it's a bug.
> *Observation*
>  Airflow is configured to run in timezone Copenhagen.
>  I have a where clause like
>  ```
> {code:java}
> WHERE inserted_at >= '{{execution_date}}' AND inserted_at < 
> '{{next_execution_date}}'{code}
> I execute the task using `airflow test ... 2019-07-01`.
>  The clause is rendered as
> {code:java}
> WHERE inserted_at >= '2019-07-01T00:00:00+02:00' AND inserted_at < 
> '2019-07-01 22:00:00+00:00'{code}
>  
> Note how `execution_date` is printed as UTC+2, while `next_execution_date` is 
> printed in UTC. I believe the timestamps actually decribe the correct 
> interval, but to be certain I tried explicitly converting to UTC:
> {code:java}
> WHERE inserted_at >= '{{execution_date.in_tz('UTC')}}' AND inserted_at < 
> '{{next_execution_date.in_tz('UTC')}}'{code}
> I then get an error that datetime.datetime does not have an in_tz method.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (AIRFLOW-4938) next_execution_date is not a Pendulum object

2019-11-21 Thread jack (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jack updated AIRFLOW-4938:
--
Comment: was deleted

(was: Related/duplicate to/of :

https://issues.apache.org/jira/browse/AIRFLOW-4788)

> next_execution_date is not a Pendulum object
> 
>
> Key: AIRFLOW-4938
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4938
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: models
>Affects Versions: 1.10.2
> Environment: Airlfow 1.10.2 in Docker using puckel docker image.
> Configured for timezone Copenhagen
>Reporter: Adam Andersen Læssøe
>Priority: Major
>
> When templating, it seems `execution_date` refers to a Pendulum object while 
> `next_execution_date` refers to a native `datetime` object.
>  This is inconsistent, and contrary to what the docs say, so I'm thinking 
> it's a bug.
> *Observation*
>  Airflow is configured to run in timezone Copenhagen.
>  I have a where clause like
>  ```
> {code:java}
> WHERE inserted_at >= '{{execution_date}}' AND inserted_at < 
> '{{next_execution_date}}'{code}
> I execute the task using `airflow test ... 2019-07-01`.
>  The clause is rendered as
> {code:java}
> WHERE inserted_at >= '2019-07-01T00:00:00+02:00' AND inserted_at < 
> '2019-07-01 22:00:00+00:00'{code}
>  
> Note how `execution_date` is printed as UTC+2, while `next_execution_date` is 
> printed in UTC. I believe the timestamps actually decribe the correct 
> interval, but to be certain I tried explicitly converting to UTC:
> {code:java}
> WHERE inserted_at >= '{{execution_date.in_tz('UTC')}}' AND inserted_at < 
> '{{next_execution_date.in_tz('UTC')}}'{code}
> I then get an error that datetime.datetime does not have an in_tz method.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-578) BaseJob does not check return code of a process

2019-11-19 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978161#comment-16978161
 ] 

jack commented on AIRFLOW-578:
--

[~lucafuji] are you still working on it?

> BaseJob does not check return code of a process
> ---
>
> Key: AIRFLOW-578
> URL: https://issues.apache.org/jira/browse/AIRFLOW-578
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Ze Wang
>Priority: Major
>
> BaseJob ignores the return code of the spawned process. which makes even that 
> process is killed or returned abnormally, it will think it finishes with 
> success



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5060) Add support of CatalogId to AwsGlueCatalogHook

2019-11-19 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977628#comment-16977628
 ] 

jack commented on AIRFLOW-5060:
---

[~ash] any thoughts?

> Add support of CatalogId to AwsGlueCatalogHook
> --
>
> Key: AIRFLOW-5060
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5060
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: hooks
>Affects Versions: 1.10.3
>Reporter: Ilya Kisil
>Assignee: Ilya Kisil
>Priority: Minor
>
> h2. Use Case
> Imagine that you stream data into S3 bucket of an *account A* and update AWS 
> Glue datacatalog on a daily basis, so that you can query new data with AWS 
> Athena. Now let's assume that you provided access to this S3 bucket for an 
> external *account B* who wants to use its' own AWS Athena to query your data 
> in an exactly the same way. Unfortunately, an *account B* would need to have 
> exactly the same table definitions in its AWS Glue Datacatalog, because AWS 
> Athena cannot run against external glue datacatalog. However, AWS Glue 
> service supports [cross-account datacatalog 
> access|[https://docs.aws.amazon.com/glue/latest/dg/cross-account-access.html]],
>  which means that *account B* can simply copy/sync meta information about 
> database, tables, partitions etc from glue data catalog of an *account A*, 
> provided additional permissions have been granted. Thus, all methods in 
> *AwsGlueCatalogHook* should an use "CatalogId", i.e. ID of the Data Catalog 
> from which to retrieve/create/delete.
> h2.  
> h2. How it fits into Airflow
> Assume that you have an AWSAthenaOperator, which queries data once a day, 
> then result is retrieved, visualised locally and then uploaded to some 
> server/website. Then before this happens, you simply need to create an 
> operator (even PythonOperator would do) which has two hooks, one to source 
> catalog and another to destination catalog. At run time, it would use source 
> hook retrieve information from *account A*, for example 
> [get_partitions()|[https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.get_partitions],
>  then parse response and remove unnseccary keys and finally use destination 
> hook to update *account B* datacatalog with 
> [batch_create_partitions()|[https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.batch_create_partition]]
>  
> h2. Proposal
>  * Add a parameter *catalog_id* to AwsGlueCatalogHook, which then will be 
> used in all its methods, regardless of this hook associated with source or 
> destination datacatalog. 
>  * In order not to break exsisting implementation, we set *catalog_id=None.* 
> But we add method *fallback_catalog_id(),* which uses AWS STS to infer 
> Catalog ID associated with used *aws_conn_id.* Obtained value * *would be 
> used if *catalog_id* hasn't been provided during hook creation.
>  * Extend available methods of *AwsGlueCatalogHook* in a similar way to 
> already exsisting once, for convenience of the workflow described above. 
> Note: all new methods should strictly adhere AWS Glue Client Request Syntax 
> and do it in transparent manner. This means, that input information shouldn't 
> be modified within a method. When such actions are required, they should be 
> performed outside of the AwsGlueCatalogHook.
> h2. Implementation
>  * I am happy to contribute to airflow if this feature request gets approved.
> h2. Other considerations
>  * At the moment an existing method *get_partitions* doesn't not provide you 
> with all metainformation about partitions available from glue client, whereas 
> *get_table* does. Don't know the best way around it, but imho it should be 
> refactored to *get_partitions_values* or something like that. In this way, we 
> would be able to stay inline with boto3 glue client.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5920) Add support to execute OpenCypher query against Neo4j

2019-11-17 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976003#comment-16976003
 ] 

jack commented on AIRFLOW-5920:
---

[~tfindlay] you can open a Draft PR and get comments from contributors. 

> Add support to execute OpenCypher query against Neo4j
> -
>
> Key: AIRFLOW-5920
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5920
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: hooks, operators
>Affects Versions: 1.10.7
>Reporter: Timothy Findlay
>Assignee: Timothy Findlay
>Priority: Minor
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> As a DAG developer
> I want to create DAG tasks to execute OpenCypher queries against a graph 
> database
> So that the output can be used elsewhere in a DAG / business



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5232) Create pagerdutyhook in the airflow plugins, to support pagerduty.

2019-11-17 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975997#comment-16975997
 ] 

jack commented on AIRFLOW-5232:
---

pagerdury hook was added in https://issues.apache.org/jira/browse/AIRFLOW-5832

This Jira can be closed

> Create pagerdutyhook in the airflow plugins, to support pagerduty.
> --
>
> Key: AIRFLOW-5232
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5232
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: plugins
>Affects Versions: 1.10.5
>Reporter: sri ram chegondi
>Priority: Major
>
> Create pagerdutyhook in the airflow plugins, to support pagerduty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3344) Airflow DAG object clear function does not clear tasks in the upstream_failed state when only_failed=True

2019-11-17 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975993#comment-16975993
 ] 

jack commented on AIRFLOW-3344:
---

[~steveatbat] can you PR your suggested change?

> Airflow DAG object clear function does not clear tasks in the upstream_failed 
> state when only_failed=True
> -
>
> Key: AIRFLOW-3344
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3344
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Affects Versions: 1.8.2, 1.9.0, 1.10.0
>Reporter: Steve Jacobs
>Priority: Minor
>  Labels: easyfix, newbie
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When using the airflow clear command from the cli, you can pass an argument 
> --only_failed to clear only failed tasks. This will clear ONLY tasks with the 
> state failed, and not tasks with the state upstream_failed, causing any clear 
> to still fail the dag_run if any upstream tasks are failed.
> Since one_failed as a trigger rule also checks for upstream_failed tasks, it 
> seems consistent that this should also clear upstream_failed tasks. The 
> relevant code change necessary is here:
> {code:java}
> if only_failed:
>  tis = tis.filter(TI.state == State.FAILED)
> {code}
> to
> {code:java}
> if only_failed:
>   tis = tis.filter(TI.state.in_([State.FAILED, State.UPSTREAM_FAILED]))
> {code}
> in models.py
> Additionally when clearing dags, the dag_run is set to the running state, but 
> the dag_run start_date is not updated to the current time, as it is when 
> clearing tasks through the Web UI. This causes dag_runs to fail on their 
> timeouts even if the dag is full of successful tasks. This needs to be 
> changed as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-918) Improve bulk_load function for MySqlHook

2019-11-17 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975988#comment-16975988
 ] 

jack commented on AIRFLOW-918:
--

Duplicate of https://issues.apache.org/jira/browse/AIRFLOW-5921

[~feluelle] no need for two Jiras

> Improve bulk_load function for MySqlHook
> 
>
> Key: AIRFLOW-918
> URL: https://issues.apache.org/jira/browse/AIRFLOW-918
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Affects Versions: 1.7.1.3
>Reporter: Ali Uz
>Priority: Minor
>  Labels: easyfix, patch
>
> I think we can improve the `bulk_load` function in MySqlHook by adding a few 
> more parameters. For example, if I want to run a LOAD DATA command like the 
> following:
> ```
> LOAD DATA LOCAL INFILE 'abc.csv' INTO TABLE abc
> FIELDS TERMINATED BY ',' 
> ENCLOSED BY '"' 
> LINES TERMINATED BY '\r\n'
> IGNORE 1 LINES
> ```
> I would expect to supply the delimiter parameters, enclosing quotes 
> parameter, line terminating parameter and ignore line number parameter.
> The current function only applies the following command:
> ```
> LOAD DATA LOCAL INFILE 'abc.csv' INTO TABLE abc
> ```
> It would be great if we could extend it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5616) PrestoHook to use prestodb

2019-11-17 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975985#comment-16975985
 ] 

jack commented on AIRFLOW-5616:
---

By the way PrestoSQL might be better than Prestodb 
[https://github.com/prestosql/presto/issues/380]

> PrestoHook to use prestodb
> --
>
> Key: AIRFLOW-5616
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5616
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Affects Versions: 1.10.5
>Reporter: Alexandre Brilhante
>Priority: Minor
>
> PrestoHook currently uses PyHive which doesn't support transactions whereas 
> prestodb 
> ([https://github.com/prestodb/presto-python-client)|https://github.com/prestodb/presto-python-client]
>  does. I think it would more flexible to use prestodb as client. I can work 
> on a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5941) MySQLHook initialization fails when db charset is utf8mb4

2019-11-17 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975982#comment-16975982
 ] 

jack commented on AIRFLOW-5941:
---

Possible root cause is https://issues.apache.org/jira/browse/AIRFLOW-4824 ?

> MySQLHook initialization fails when db charset is utf8mb4
> -
>
> Key: AIRFLOW-5941
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5941
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Affects Versions: 1.10.6
> Environment: AWS EC2 instance
>Reporter: Tomasz Żukowski
>Priority: Minor
>
> When trying to get the connection from MySQLHook below error is raised.
> Airflow version - 1.10.6
>  MySQL version - 8.0.15
>  MySQLdb (mysqlclient) - tested with 1.3.14(both installed with pip and built 
> locally) and 1.4.2.post1
> connection extra:
> {code:python}
> {"charset":"utf8mb4"}
> {code}
> DB charset is set to utf8mb4
>  Error message:
> {code:python}
> [2019-11-15 16:55:46,477] {taskinstance.py:1058} ERROR - (2006, "Can't 
> initialize character set unknown (path: compiled_in)")
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 
> 930, in _run_raw_task
> result = task_copy.execute(context=context)
>   File 
> "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py",
>  line 113, in execute
> return_value = self.execute_callable()
>   File 
> "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py",
>  line 118, in execute_callable
> return self.python_callable(*self.op_args, **self.op_kwargs)
>   File "", line 136, in mysql_***
> mysql_conn = mysql_hook.get_conn()
>   File "/usr/local/lib/python3.7/site-packages/airflow/hooks/mysql_hook.py", 
> line 116, in get_conn
> conn = MySQLdb.connect(**conn_config)
>   File 
> "/usr/local/lib64/python3.7/site-packages/mysqlclient-1.3.14-py3.7-linux-x86_64.egg/MySQLdb/__init__.py",
>  line 85, in Connect
> return Connection(*args, **kwargs)
>   File 
> "/usr/local/lib64/python3.7/site-packages/mysqlclient-1.3.14-py3.7-linux-x86_64.egg/MySQLdb/connections.py",
>  line 208, in __init__
> super(Connection, self).__init__(*args, **kwargs2)
> _mysql_exceptions.OperationalError: (2006, "Can't initialize character set 
> unknown (path: compiled_in)")
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (AIRFLOW-5939) exclude serve_logs from CLI docs

2019-11-15 Thread jack (Jira)
jack created AIRFLOW-5939:
-

 Summary: exclude serve_logs from CLI docs
 Key: AIRFLOW-5939
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5939
 Project: Apache Airflow
  Issue Type: Task
  Components: documentation
Affects Versions: 1.10.6
Reporter: jack
 Fix For: 1.10.7


https://airflow.apache.org/cli.html#serve_logs

Quoting ash on slack:

"We should exclude {{serve_logs}} form that list -- it's run automatically as 
part of {{airflow worker}} so isn't a command users should run or care about."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4789) Add Mysql Exception Handling while pushing data to XCom

2019-11-14 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974327#comment-16974327
 ] 

jack commented on AIRFLOW-4789:
---

What is the exception?
Can you provide example to reproduce ?

> Add Mysql Exception Handling while pushing data to XCom
> ---
>
> Key: AIRFLOW-4789
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4789
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: xcom
>Affects Versions: 1.10.2
>Reporter: raman
>Priority: Minor
>
> We have seen mysql exceptions while setting up xcom data. It would be better 
> to add exception handling there



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4848) MySQL warnings about aborted connections, missing engine disposal

2019-11-14 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974325#comment-16974325
 ] 

jack commented on AIRFLOW-4848:
---

Duplicate of https://issues.apache.org/jira/browse/AIRFLOW-3544? ?

> MySQL warnings about aborted connections, missing engine disposal
> -
>
> Key: AIRFLOW-4848
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4848
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: database
>Reporter: t oo
>Assignee: Daniel Huang
>Priority: Minor
>
> I am not referring to airflow logs in filesystem. I am referrring to logs in 
> the MySQL db itself. This affects airflow 1.10.3, mysql rds 5.7.25
>  
> ie
>  
> 2019-06-25T09:55:25.126187Z 54996343 [Note] Aborted connection 54996343 to 
> db: 'airflowdb' user: 'f' host: 'host' (Got an error reading communication 
> packets)
>  2019-06-25T09:55:25.392705Z 54996375 [Note] Aborted connection 54996375 to 
> db: 'airflowdb' user: 'f' host: 'host' (Got an error reading communication 
> packets)
>  2019-06-25T09:55:25.450276Z 54996240 [Note] Aborted connection 54996240 to 
> db: 'airflowdb' user: 'f' host: 'host' (Got an error reading communication 
> packets)
>  2019-06-25T09:55:25.592741Z 54996391 [Note] Aborted connection 54996391 to 
> db: 'airflowdb' user: 'f' host: 'host' (Got an error reading communication 
> packets)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4066) dag_run failed while one operator is still running

2019-11-14 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974156#comment-16974156
 ] 

jack commented on AIRFLOW-4066:
---

Which airflow version are you running?

> dag_run failed while one operator is still running
> --
>
> Key: AIRFLOW-4066
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4066
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: will-beta
>Priority: Major
> Attachments: image-2019-03-11-20-08-19-690.png
>
>
>  !image-2019-03-11-20-08-19-690.png! 
> it's a BashOperator which has run a bash script for more than 10 minutes.
> it occurs to me that once i noticed an operator failed while its preceding 
> sensor was still running.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5627) transform function should be optional for s3_file_tranformation_operator

2019-11-14 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974152#comment-16974152
 ] 

jack commented on AIRFLOW-5627:
---

If you don't want to transform then don't use transform operator.

Operator should do what it was designed to do.

For your use case of moving files from path to path on S3 use copy_object in 
S3Hook.

 

> transform function should be optional for s3_file_tranformation_operator
> 
>
> Key: AIRFLOW-5627
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5627
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 2.0.0
>Reporter: Ke Zhu
>Priority: Major
>
> h3. What happened
> After AIRFLOW-2299, it asks people to choose either {{transform_expression}} 
> or {{transform_script}} when using S3FileTransformOperator. According to user 
> case like moving objects only without any content transformation, it has to 
> use some hack like {{transform_script='/bin/cp'}}, which simply copy a temp 
> file to another temp file. 
>  If you use neither parameter, it will throw exception saying {{Either 
> transform_script or select_expression must be specified}}. See 
> [https://github.com/apache/airflow/blob/d719e1fd6705a93a0dfefef4b46478ade5e006ea/airflow/operators/s3_file_transform_operator.py#L110-L112]
> h3. Expected outcome
> enhancement like -AIRFLOW-2299- should not force user to use new added 
> feature like transform_expression/transform_script or choose hacking path to 
> workaround it. these two parameters should just be optional.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-2572) Airflow /admin/configurationview/ Running Configuration not accurate

2019-11-14 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974119#comment-16974119
 ] 

jack commented on AIRFLOW-2572:
---

[~joejasinski] can you recheck this with newer version of Airflow?

> Airflow /admin/configurationview/ Running Configuration not accurate
> 
>
> Key: AIRFLOW-2572
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2572
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.0
> Environment: Debian 8 (in a Docker container)
>Reporter: Joe Jasinski
>Priority: Minor
>
> The admin Runtime Configuration section of /admin/configurationview/ is 
> showing config settings from the config file but not the environment 
> variables set. 
> When that page renders, this loop is called, which uses the config.as_dict() 
> method:
> https://github.com/apache/incubator-airflow/blob/master/airflow/www/views.py#L2868
> The conf.as_dict() method doesn't properly read in from the environment 
> variables. In the example below, see how we can get the expected value from 
> conf.get() but not from conf.as_dict().  I think it has to do with the 
> implementation of as_dict()
> export 
> AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:@postgres:5432/airflow
> >>> from airflow import configuration as conf
> >>> import os
> >>> os.environ["AIRFLOW__CORE__SQL_ALCHEMY_CONN"]
> 'postgresql+psycopg2://airflow:@postgres:5432/airflow'
> >>> dict(conf.as_dict(True, True))['core']['sql_alchemy_conn']
> ('sqlite:usr/local/airflow/airflow/airflow.db', 'bash cmd')
> >>> conf.get('core', 'sql_alchemy_conn')
> 'postgresql+psycopg2://airflow:@postgres:5432/airflow'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-1753) Can't install on windows 10

2019-11-07 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969386#comment-16969386
 ] 

jack commented on AIRFLOW-1753:
---

pwd is a built-in module(come with python installation) for unix like only os.
For windows maybe winpwd can work

> Can't install on windows 10
> ---
>
> Key: AIRFLOW-1753
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1753
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Lakshman Udayakantha
>Priority: Major
>
> When I installed airflow using "pip install airflow command" two errors pop 
> up.
> 1.  link.exe failed with exit status 1158
> 2.\x86_amd64\\cl.exe' failed with exit status 2
> first issue can be solved by reffering 
> https://stackoverflow.com/questions/43858836/python-installing-clarifai-vs14-0-link-exe-failed-with-exit-status-1158/44563421#44563421.
> But second issue is still there. there was no any solution by googling also. 
> how to prevent that issue and install airflow on windows 10 X64.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3565) 404 response is unreachable

2019-11-07 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969052#comment-16969052
 ] 

jack commented on AIRFLOW-3565:
---

Is this still an issue? We are 6 releases after 1.10.0

> 404 response is unreachable
> ---
>
> Key: AIRFLOW-3565
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3565
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: 1.10.0
>Reporter: Nimrod Morag
>Priority: Minor
>  Labels: easyfix, newbie
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> when making a bad request (e.g. non existent URL), you get this error:
> webserver_1 | [2018-12-25 11:13:53 +] [667] [ERROR] Error handling 
> request /health
> webserver_1 | Traceback (most recent call last):
> webserver_1 | File 
> "/usr/local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 135, 
> in handle
> webserver_1 | self.handle_request(listener, req, client, addr)
> webserver_1 | File 
> "/usr/local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 176, 
> in handle_request
> webserver_1 | respiter = self.wsgi(environ, resp.start_response)
> webserver_1 | File "/usr/local/lib/python3.5/site-packages/werkzeug/wsgi.py", 
> line 826, in __call__
> webserver_1 | return app(environ, start_response)
> webserver_1 | File 
> "/usr/local/lib/python3.5/site-packages/airflow/www/app.py", line 173, in 
> root_app
> webserver_1 | resp(b'404 Not Found', [(b'Content-Type', b'text/plain')])
> webserver_1 | File 
> "/usr/local/lib/python3.5/site-packages/gunicorn/http/wsgi.py", line 253, in 
> start_response
> webserver_1 | self.process_headers(headers)
> webserver_1 | File 
> "/usr/local/lib/python3.5/site-packages/gunicorn/http/wsgi.py", line 260, in 
> process_headers
> webserver_1 | raise TypeError('%r is not a string' % name)
> webserver_1 | TypeError: b'Content-Type' is not a string
>  
> which raises a 500 Internal Server Error in the browser and not showing the 
> hard-coded 404 response, because Gunicorn expects string in headers and 
> Airflow sends bytes
>  
> site-packages/airflow/www/app.py lines 172-174
> def root_app(env, resp):
>     resp(b'404 Not Found', [(b'Content-Type', b'text/plain')])
>     return [b'Apache Airflow is not at this location']



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-330) Decorated PythonOperator python_callable functions don't show the original function in task code view

2019-11-06 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968374#comment-16968374
 ] 

jack commented on AIRFLOW-330:
--

if it's only Python 2 issue than probably this isn't needed anymore 

> Decorated PythonOperator python_callable functions don't show the original 
> function in task code view
> -
>
> Key: AIRFLOW-330
> URL: https://issues.apache.org/jira/browse/AIRFLOW-330
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Reporter: Jon McKenzie
>Priority: Minor
>
> In Python 3.4 or below, if you try to decorate the {{python_callable}} to a 
> {{PythonOperator}} in the following manner (i.e. like the manual application 
> of a standard Python decorator using {{functools.wraps}}):
> {noformat}
> task.python_callable = wrap(task.python_callable)
> {noformat}
> ...the code view of that task in the web UI shows the code for the {{wrap}} 
> function rather than the initial {{python_callable}}. 
> The fix is to run something like this (where {{inspect.unwrap}} is available 
> in Python 3.4+):
> {noformat}
> inspect.getsource(inspect.unwrap(func))
> {noformat}
> ...rather than:
> {noformat}
> inspect.getsource(func)
> {noformat}
> I'm not sure if this is something worth fixing or not, since I believe Python 
> 3.5+ implements the above fix (although I believe it would still be an issue 
> in Python 2.x).
> Just for some background, I'm writing a higher level API around Airflow that 
> takes tasks as arguments and connects their inputs via {{XCom}} (among other 
> things). The callables I want my API users to write aren't going to need 
> access to any of the task context (only so that they don't need to know 
> Airflow internals), hence the need to decorate them appropriately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3242) execution_date for TriggerDagRunOperator should be based from Triggering dag

2019-11-05 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967657#comment-16967657
 ] 

jack commented on AIRFLOW-3242:
---

[~zhoufengzd] do you still experience the issue? I worked with backfill and 
TriggerDagRunOperator and didn't experience this problem. 

> execution_date for TriggerDagRunOperator should be based from Triggering dag
> 
>
> Key: AIRFLOW-3242
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3242
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Affects Versions: 1.8.2, 1.9.0, 1.10.0
> Environment: any linux / mac os
>Reporter: Feng Zhou
>Priority: Major
>
> TriggerDagRunOperator should pick up execute_date from context instead just 
> default to today. This broke back filling logic if TriggerDagRunOperator is 
> used.
> Could simply add one line to address this issue, see red highlighted line 
> below:
>     def execute(self, context):
>     
>     dr = trigger_dag.create_dagrun(
>     run_id=dro.run_id,
>     state=State.RUNNING,
>     
> *_{color:#FF}execution_date=context['execution_date'],{color}_*  ## 
> around line#70
>     conf=dro.payload,
>     external_trigger=True)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3507) Fix Airflow k8s CI

2019-11-05 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967653#comment-16967653
 ] 

jack commented on AIRFLOW-3507:
---

It was longed fixed?

> Fix Airflow k8s CI
> --
>
> Key: AIRFLOW-3507
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3507
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Tao Feng
>Priority: Major
>
> The k8s ci is failed with two 
> tests([https://travis-ci.org/apache/incubator-airflow/jobs/467343912).] Fix 
> the tests to unblock airflow k8s ci.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3666) ExternalTaskSensor is not triggering the task

2019-11-05 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967651#comment-16967651
 ] 

jack commented on AIRFLOW-3666:
---

What version of airflow are you running?
can you check against 1.10.6?

> ExternalTaskSensor is not triggering the task
> -
>
> Key: AIRFLOW-3666
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3666
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Reporter: Darshan Mehta
>Priority: Major
>
> Dependent task is not getting triggered even after upstream finishes 
> successfully.
> master DAG:
> {code:java}
> from airflow import DAG
> from airflow.operators.jdbc_operator import JdbcOperator
> from datetime import datetime
> from airflow.operators.bash_operator import BashOperator
> today = datetime.today()
> default_args = {
> 'depends_on_past': False,
> 'retries': 0,
> 'start_date': datetime(today.year, today.month, today.day),
> 'schedule_interval': '@once'
> }
> dag = DAG('call-procedure-and-bash', default_args=default_args)
> call_procedure = JdbcOperator(
> task_id='call_procedure',
> jdbc_conn_id='airflow_db2',
> sql='CALL AIRFLOW.TEST_INSERT (20)',
> dag=dag
> )
> call_procedure
> {code}
> Dependent DAG:
> {code:java}
> from airflow import DAG
> from airflow.operators.jdbc_operator import JdbcOperator
> from datetime import datetime, timedelta
> from airflow.sensors.external_task_sensor import ExternalTaskSensor
> today = datetime.today()
> default_args = {
> 'depends_on_past': False,
> 'retries': 0,
> 'start_date': datetime(today.year, today.month, today.day),
> 'schedule_interval': '@once'
> }
> dag = DAG('external-dag-upstream', default_args=default_args)
> task_sensor = ExternalTaskSensor(
> task_id='link_upstream',
> external_dag_id='call-procedure-and-bash',
> external_task_id='call_procedure',
> execution_delta=timedelta(minutes=-2),
> dag=dag
> )
> count_rows = JdbcOperator(
> task_id='count_rows',
> jdbc_conn_id='airflow_db2',
> sql='SELECT COUNT(*) FROM AIRFLOW.TEST',
> dag=dag
> )
> count_rows.set_upstream(task_sensor)
> {code}
> Master DAG executes successfully whereas downstream DAG gets stuck in 
> 'Poking' state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3915) Scheduler fails for dags with datetime start_date

2019-11-05 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967642#comment-16967642
 ] 

jack commented on AIRFLOW-3915:
---

If this is still an issue you are welcome to create PR

> Scheduler fails for dags with datetime start_date
> -
>
> Key: AIRFLOW-3915
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3915
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: models
>Affects Versions: 1.10.2
>Reporter: David Stuck
>Priority: Major
>
> When start_date is passed in to a dag as a datetime object, it does not get 
> converted to pendulum and thus the scheduler fails at 
> [https://github.com/apache/airflow/blob/4083a8f5217e9ca7a5c83a3eaaaf403dd367a90c/airflow/models.py#L3487]
>  when trying to access `self.timezone.name`.
> My guess is that the fix is as simple as setting `self.timezone = 
> pendulum.instance(self.timezone)` in `__init__`. If that sounds right I can 
> create a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5814) Implementing Presto hook tests

2019-11-05 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967621#comment-16967621
 ] 

jack commented on AIRFLOW-5814:
---

Merged. Jira can be closed

> Implementing Presto hook tests
> --
>
> Key: AIRFLOW-5814
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5814
> Project: Apache Airflow
>  Issue Type: Test
>  Components: hooks
>Affects Versions: 1.10.5
>Reporter: Sayed Mohammad Hossein Torabi
>Assignee: Sayed Mohammad Hossein Torabi
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (AIRFLOW-4881) Zombie collection fails task instances that should be scheduled for retry

2019-11-05 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967518#comment-16967518
 ] 

jack edited comment on AIRFLOW-4881 at 11/5/19 1:15 PM:


Resolved in  [https://github.com/apache/airflow/pull/5511] ?


was (Author: jackjack10):
Resolved in [https://github.com/apache/airflow/pull/5514] ?

> Zombie collection fails task instances that should be scheduled for retry
> -
>
> Key: AIRFLOW-4881
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4881
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.10.3
>Reporter: Tamas Flamich
>Priority: Major
>
> In case a task instance
>  * has more attempts than retries (it can happen when the state of the task 
> instance is explicitly cleared) and
>  * task instance is prematurely terminated (without graceful shutdown)
> then zombie collection process of the scheduler can mark the task instance 
> failed instead of retrying it. 
> Steps to reproduce:
> 1 - The task is scheduled for a particular executed date and the following 
> records gets created in the database.
> ||task_id||retries||try_number||max_tries||state||
> |emr_sensor|2|1|2|running|
> 2 - The job owners would like to schedule the task again therefore they clear 
> that state of the task instance. {{try_number}} and {{max_retries}} gets 
> updated.
> ||task_id||retries||try_number||max_tries||state||
> |emr_sensor|2|2|3|running|
> 3 - The Airlflow scheduler gets killed and a new scheduler instance starts 
> looking for zombie tasks. Since {{try_number < max_tries}}, the new state is 
> {{up_for_retry}}. However, there is a bug in the [state update 
> logic|https://github.com/apache/airflow/blob/d5a5b9d9f1f1efb67ffed4d8e6ef3e0a06467bed/airflow/models/dagbag.py#L295]
>  that will revert the {{max_tries}} value to the initial value ({{retries}}).
> ||task_id||retries||try_number||max_tries||state||
> |emr_sensor|2|2|2|up_for_retry|
> 4 - During the next iteration of the scheduler, the task instance gets picked 
> up. However, since {{try_number >= max_tries}}, the new state is {{failed}}.
> ||task_id||retries||try_number||max_tries||state||
> |emr_sensor|2|2|2|failed|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-2732) Split hooks and operators out from core Airflow

2019-11-05 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967517#comment-16967517
 ] 

jack commented on AIRFLOW-2732:
---

Do we need this? we are now moving stuff in the other direction from contrib to 
core

> Split hooks and operators out from core Airflow
> ---
>
> Key: AIRFLOW-2732
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2732
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ci, hooks, operators, tests
>Reporter: Taylor Edmiston
>Assignee: Taylor Edmiston
>Priority: Major
>
> The goal of this issue is to split out hooks and operators from the core 
> incubator-airflow repo into a second repo to facilitate:
>  # faster CI builds of core
>  # more frequent releases of hooks & operators by decoupling them from core
> We have discussed this work with Max last month and it's also been mentioned 
> on the Airflow Dev list in the thread "Apache Airflow 1.10.0b3".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4881) Zombie collection fails task instances that should be scheduled for retry

2019-11-05 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967518#comment-16967518
 ] 

jack commented on AIRFLOW-4881:
---

Resolved in [https://github.com/apache/airflow/pull/5514] ?

> Zombie collection fails task instances that should be scheduled for retry
> -
>
> Key: AIRFLOW-4881
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4881
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.10.3
>Reporter: Tamas Flamich
>Priority: Major
>
> In case a task instance
>  * has more attempts than retries (it can happen when the state of the task 
> instance is explicitly cleared) and
>  * task instance is prematurely terminated (without graceful shutdown)
> then zombie collection process of the scheduler can mark the task instance 
> failed instead of retrying it. 
> Steps to reproduce:
> 1 - The task is scheduled for a particular executed date and the following 
> records gets created in the database.
> ||task_id||retries||try_number||max_tries||state||
> |emr_sensor|2|1|2|running|
> 2 - The job owners would like to schedule the task again therefore they clear 
> that state of the task instance. {{try_number}} and {{max_retries}} gets 
> updated.
> ||task_id||retries||try_number||max_tries||state||
> |emr_sensor|2|2|3|running|
> 3 - The Airlflow scheduler gets killed and a new scheduler instance starts 
> looking for zombie tasks. Since {{try_number < max_tries}}, the new state is 
> {{up_for_retry}}. However, there is a bug in the [state update 
> logic|https://github.com/apache/airflow/blob/d5a5b9d9f1f1efb67ffed4d8e6ef3e0a06467bed/airflow/models/dagbag.py#L295]
>  that will revert the {{max_tries}} value to the initial value ({{retries}}).
> ||task_id||retries||try_number||max_tries||state||
> |emr_sensor|2|2|2|up_for_retry|
> 4 - During the next iteration of the scheduler, the task instance gets picked 
> up. However, since {{try_number >= max_tries}}, the new state is {{failed}}.
> ||task_id||retries||try_number||max_tries||state||
> |emr_sensor|2|2|2|failed|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3098) Dag run succeeds (all related tasks green) yet airflow marks it as failed

2019-11-05 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967515#comment-16967515
 ] 

jack commented on AIRFLOW-3098:
---

can you provide code to reproduce?

> Dag run succeeds (all related tasks green) yet airflow marks it as failed
> -
>
> Key: AIRFLOW-3098
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3098
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws, DagRun
>Affects Versions: 1.10.0
>Reporter: Yassine
>Priority: Major
>
> The setup is a standard airflow install backed by sqlite db running a 
> sequential executor. I created a simple dag with linear task layout as 
> follows:
>  
> *[ Create an EMR cluster ] >> [ submit first EMR step ] >> [ check first step 
> status ] >> [ submit second EMR step ] >> [ check second step status ]* 
>  
> All tasks finish and are green (nor error in the task logs) yet airflow marks 
> the dag run as failed. I have no clue why.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3032) _pickle.UnpicklingError with using remote MySQL Server

2019-11-05 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967484#comment-16967484
 ] 

jack commented on AIRFLOW-3032:
---

[~ash] can be closed

> _pickle.UnpicklingError with using remote MySQL Server
> --
>
> Key: AIRFLOW-3032
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3032
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: database
>Affects Versions: 1.9.0
>Reporter: Max Richter
>Priority: Blocker
> Attachments: error_log.txt, pip_list.txt
>
>
> Hello,
> I am running Airflow 1.9.0 successfully with a localhost MySQL database, 
> version 5.7.23.
> I switched sql_alchemy_conn = 
> mysql://airflow:@:3306/airflow in order to use the 
> proper MySQL server - same version 5.7.23.
> I created a dump from my local instance to the remote one.
> Issue:
>  * When tasks are executed by the scheduler everything runs fine, tasks are 
> executed and DB updated
>  * When manually triggering a task via the webserver, I am getting 
> "_pickle.UnpicklingError" please see error__log.txt for full log
> In the end, I only changed this one line in airflow.cfg which is causing that 
> I can not use it with a remote MySQL server.
>  
> Best,
> Max



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (AIRFLOW-5803) Rename S3Hook to AWSS3Hook

2019-10-29 Thread jack (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jack updated AIRFLOW-5803:
--
Comment: was deleted

(was: It doesn't need to be renamed.

[~basph] is working on moving AWS operators /hooks to it's own path

so hook will be accessed by /providers/aws [AIP-21])

> Rename S3Hook to AWSS3Hook
> --
>
> Key: AIRFLOW-5803
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5803
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Affects Versions: 2.0.0
>Reporter: MinJae Kwon
>Assignee: MinJae Kwon
>Priority: Minor
> Fix For: 2.0.0
>
>
> S3Hook class should be AWSS3Hook by conventions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5803) Rename S3Hook to AWSS3Hook

2019-10-29 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962050#comment-16962050
 ] 

jack commented on AIRFLOW-5803:
---

It doesn't need to be renamed.

[~basph] is working on moving AWS operators /hooks to it's own path

so hook will be accessed by /providers/aws [AIP-21]

> Rename S3Hook to AWSS3Hook
> --
>
> Key: AIRFLOW-5803
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5803
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Affects Versions: 2.0.0
>Reporter: MinJae Kwon
>Assignee: MinJae Kwon
>Priority: Minor
> Fix For: 2.0.0
>
>
> S3Hook class should be AWSS3Hook by conventions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3514) Documentation for run_query slightly off for bigquery_hook

2019-10-28 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961538#comment-16961538
 ] 

jack commented on AIRFLOW-3514:
---

[~kamil.bregula] [~kaxilnaik]  can be closed?

> Documentation for run_query slightly off for bigquery_hook
> --
>
> Key: AIRFLOW-3514
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3514
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Affects Versions: 1.10.0
>Reporter: joyce chan
>Priority: Trivial
>
> The python docs for the run_query method of BigQueryHook says
>  
> {code:java}
> :param query_params a dictionary containing query parameter types and values, 
> passed to BigQuery   
> :type query_params: dict{code}
>  
> but it should be an array of dictionary, according to the documentation
> https://cloud.google.com/bigquery/docs/parameterized-queries#bigquery-query-params-arrays-api



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-1663) Redshift Connection, Hook, & Operator for COPY command usability

2019-10-28 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961540#comment-16961540
 ] 

jack commented on AIRFLOW-1663:
---

Possibly what was meant to be done on:

https://issues.apache.org/jira/browse/AIRFLOW-5338

 

> Redshift Connection, Hook, & Operator for COPY command usability
> 
>
> Key: AIRFLOW-1663
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1663
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: hooks, operators
>Reporter: Andy Hadjigeorgiou
>Assignee: Andy Hadjigeorgiou
>Priority: Minor
>
> I'm using Redshift as a data warehouse in conjunction with Airflow, and I've 
> found that it wasn't immediately apparent that Airflow had the 
> hooks/connections to support Redshift. In practice, because Redshift is based 
> off of Postgres, a Postgres hook works for basic commands. However, when 
> running a COPY command (uniquely built in Redshift to copy data in parallel), 
> more work is necessary to include AWS credentials (ideally credentials aren't 
> in version control, but in a connection). Redshift's unloading to s3 feature 
> would also benefit from a solution where credentials could be stored in a 
> connection.
> My proposed solution is to include a Redshift connection, that will allow us 
> to include AWS credentials along with Redshift db connection credentials 
> (similar to an S3 connection). From here, I'll create an appropriate 
> RedshiftHook (probably an extension of PostgresHook), and a RedshiftOperator, 
> with means to simplify Redshift sql queries with AWS credentials (& perhaps 
> using psycopg2's copy_expert method).
> It's my first time posting here, and I'm looking to contribute meaningfully - 
> any feedback regarding this feature would be much appreciated! I read that 
> features which involve contributing to new hooks & operators are welcome, and 
> features in line with project Roadmap are ideal ("Adding features already 
> offered by existing workflow solutions (i.e we need to add expected 
> features"). Currently, Airflow only supports Redshift because of it's basis 
> on Postgres, but more native support will be in line with the features of 
> other workflow solutions, and attract more Redshift users.
> I've already started work on this feature, once I clean it up I'll post it 
> here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4734) Upsert functionality for PostgresHook.insert_rows()

2019-10-28 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961532#comment-16961532
 ] 

jack commented on AIRFLOW-4734:
---

[~oxymor0n] ahh :/

did you make any progress with it?

> Upsert functionality for PostgresHook.insert_rows()
> ---
>
> Key: AIRFLOW-4734
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4734
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Affects Versions: 1.10.3
>Reporter: William Tran
>Assignee: William Tran
>Priority: Minor
>  Labels: features
> Fix For: 2.0.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> PostgresHook's parent class, DbApiHook, implements upsert in its 
> insert_rows() method with the replace=True flag. However, the underlying 
> generated SQL is specific to MySQL's "REPLACE INTO" syntax and is not 
> applicable to Postgres.
> I'd like to override this method in PostgresHook to implement the "INSERT ... 
> ON CONFLICT DO UPDATE" syntax (new since Postgres 9.5)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4560) Tez queue parameter passed by mapred_queue is incorrect

2019-10-28 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961535#comment-16961535
 ] 

jack commented on AIRFLOW-4560:
---

[~ash] merged ages ago but still open :) 1.10.7?

> Tez queue parameter passed by mapred_queue is incorrect
> ---
>
> Key: AIRFLOW-4560
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4560
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Reporter: Alice Berard
>Priority: Major
>
> The parameter is currently {{tez.job.queue.name}}, see code: 
> [https://github.com/apache/airflow/blob/355bd56282e6a684c5c060953e9948ba2260aa37/airflow/hooks/hive_hooks.py#L214]
> But it should be {{tez.queue.name}}, see here: 
> [https://tez.apache.org/releases/0.9.2/tez-api-javadocs/configs/TezConfiguration.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5616) PrestoHook to use prestodb

2019-10-28 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961528#comment-16961528
 ] 

jack commented on AIRFLOW-5616:
---

[~brilhana] You can PR when ready

> PrestoHook to use prestodb
> --
>
> Key: AIRFLOW-5616
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5616
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Affects Versions: 1.10.5
>Reporter: Alexandre Brilhante
>Priority: Minor
>
> PrestoHook currently uses PyHive which doesn't support transactions whereas 
> prestodb 
> ([https://github.com/prestodb/presto-python-client)|https://github.com/prestodb/presto-python-client]
>  does. I think it would more flexible to use prestodb as client. I can work 
> on a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5286) Add requeue logic to airflow scheduler and executor

2019-10-28 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961526#comment-16961526
 ] 

jack commented on AIRFLOW-5286:
---

Sounds useful! Are you planing to PR your code?

> Add requeue logic to airflow scheduler and executor
> ---
>
> Key: AIRFLOW-5286
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5286
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: executors, scheduler
>Affects Versions: 1.10.4
>Reporter: Yingbo Wang
>Assignee: Yingbo Wang
>Priority: Major
>
> Airflow queued tasks sometime stuck for long time without being picked up. In 
> many cases the root cause is hard to debug. 
> The proposed solution here is to add requeue logic in airflow scheduler and 
> executor. For tasks with a queued state in metaDB. If the task did not show 
> in executor.queued_tasks in a certain inteval, we requeue the task so that 
> stuck tasks can be released and picked up. The solution was used in Airbnb 
> and proven to be helpful. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5290) Add is updated before and between to GoogleCloudStorageHook

2019-10-28 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961522#comment-16961522
 ] 

jack commented on AIRFLOW-5290:
---

[~kamil.bregula] GCP related but not component set

> Add is updated before and between to GoogleCloudStorageHook
> ---
>
> Key: AIRFLOW-5290
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5290
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Affects Versions: 1.10.4
>Reporter: Derrik
>Assignee: Derrik
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> GoogleCloudStorageHook provides the function to test if a blob is updated 
> after a given timestamp, but not before in between two timestamps.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5333) Move __init__ docstring to class docstring in PubSub

2019-10-28 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961521#comment-16961521
 ] 

jack commented on AIRFLOW-5333:
---

[~kamil.bregula]  Jira still open...

can u also add doc component so it will shown as doc change in the change log?

> Move __init__ docstring to class docstring in PubSub
> 
>
> Key: AIRFLOW-5333
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5333
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp
>Affects Versions: 2.0.0
>Reporter: Tomasz Urbaszek
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (AIRFLOW-3140) Add MongoDBToGoogleStorage Opearator

2019-10-28 Thread jack (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jack reassigned AIRFLOW-3140:
-

Assignee: lightQ  (was: Tanay Tummalapalli)

> Add MongoDBToGoogleStorage Opearator
> 
>
> Key: AIRFLOW-3140
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3140
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: gcp, operators
>Affects Versions: 1.10.0
>Reporter: jack
>Assignee: lightQ
>Priority: Minor
>
> Airflow has Mongo Hook
> [https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/mongo_hook.py]
>  
> Please also add operator that transfer from MongoDB to Google Storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5338) Add a RedsfhitToS3Operator

2019-10-28 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961520#comment-16961520
 ] 

jack commented on AIRFLOW-5338:
---

[~feluelle] I think it's a refactor task something like BaseSQLToS3 then 
RedsfhitToS3Operator will inherit from it? something similar to what we have in 
GCP with BaseSQLToGoogleCloudStorageOperator

> Add a RedsfhitToS3Operator
> --
>
> Key: AIRFLOW-5338
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5338
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: aws
>Affects Versions: 1.10.5
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Create an Airflow operator that queries Redshift and persists the results to 
> S3. We should be able to leverage the existing code in 
> https://github.com/apache/airflow/blob/master/airflow/contrib/operators/dynamodb_to_s3.py
>  to handle the flush to s3 logic. We should abstract that logic to a base 
> class and let RedshiftToS3Operator and DynamodbToS3Operator inherits that 
> base class



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3140) Add MongoDBToGoogleStorage Opearator

2019-10-28 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961514#comment-16961514
 ] 

jack commented on AIRFLOW-3140:
---

[~quichu] changed assign to you

> Add MongoDBToGoogleStorage Opearator
> 
>
> Key: AIRFLOW-3140
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3140
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: gcp, operators
>Affects Versions: 1.10.0
>Reporter: jack
>Assignee: Q C
>Priority: Minor
>
> Airflow has Mongo Hook
> [https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/mongo_hook.py]
>  
> Please also add operator that transfer from MongoDB to Google Storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (AIRFLOW-3140) Add MongoDBToGoogleStorage Opearator

2019-10-28 Thread jack (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jack reassigned AIRFLOW-3140:
-

Assignee: Q C  (was: lightQ)

> Add MongoDBToGoogleStorage Opearator
> 
>
> Key: AIRFLOW-3140
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3140
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: gcp, operators
>Affects Versions: 1.10.0
>Reporter: jack
>Assignee: Q C
>Priority: Minor
>
> Airflow has Mongo Hook
> [https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/mongo_hook.py]
>  
> Please also add operator that transfer from MongoDB to Google Storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-4927) Airflow task stuck in Scheduled mode due to pool not existed

2019-10-28 Thread jack (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jack updated AIRFLOW-4927:
--
Affects Version/s: 1.10.6
   1.10.5

> Airflow task stuck in Scheduled mode due to pool not existed
> 
>
> Key: AIRFLOW-4927
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4927
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.10.4, 1.10.5, 1.10.6
>Reporter: jack
>Priority: Major
>
> When operator uses pool like the scheduler schedule the task only when slot 
> is available.
> We encounter a case where one of our developers did:
>  
>  
> {code:java}
> op = CustomOperator(
>  task_id='id',
> conn_id='con',
> pool='file_default'){code}
>  
>  
> However file_default was not defined in pools.
> It took us hours to find that the pool does not exist. No error nor any 
> indication was given.
>  
> Airflow should raise broken DAG exception if trying to use pool that doesn't 
> exist. Much like it does for accessing Variable that doesn't exist.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3978) Add missing types in MySqlToGoogleCloudStorageOperator

2019-10-05 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945135#comment-16945135
 ] 

jack commented on AIRFLOW-3978:
---

The question is what happens if your MySQL table has BINARY field and you use 
MySqlToGoogleCloudStorageOperator and then load it to BigQuery

Will it be recognized as {{BYTES}} in BigQuery with auto detect?

> Add missing types in MySqlToGoogleCloudStorageOperator
> --
>
> Key: AIRFLOW-3978
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3978
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: gcp
>Affects Versions: 1.10.2
>Reporter: Roster
>Assignee: Roster
>Priority: Minor
>  Labels: gcs
>
> There fields are missing and can not be mapped: 
> TIME, BINARY , VARBINARY



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5394) Invalid schedule interval issues

2019-10-03 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943362#comment-16943362
 ] 

jack commented on AIRFLOW-5394:
---

Cron validation was introduced in [https://github.com/apache/airflow/pull/3698] 
by [~xddeng]

> Invalid schedule interval issues
> 
>
> Key: AIRFLOW-5394
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5394
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun, scheduler
>Affects Versions: 1.10.2
>Reporter: Shreyash hisariya
>Priority: Major
>
> I am facing issues with using schedule interval of Airflow. Since there is no 
> documentation at all, it took me few days to find that the cron expression 
> accepts only 5 or 6 fields.
> Even with 5 or 6 fields, the dag is failing at multiple times. 
> *For example : Invalid Cron expression: [0 15 10 * * ?] is not acceptable* 
> The above cron is valid but the airflow doesn't accept it.
>  # Can you please put up a documentation of what is valid and what not is 
> valid? 
>  # Why does airflow doesn't take more than 6 fields?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5386) Move Google Dataproc to core

2019-10-03 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943358#comment-16943358
 ] 

jack commented on AIRFLOW-5386:
---

PR was merged can Jira be closed?

> Move Google Dataproc to core
> 
>
> Key: AIRFLOW-5386
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5386
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp
>Affects Versions: 2.0.0
>Reporter: Tomasz Urbaszek
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5500) Bug in trigger api endpoint

2019-10-02 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943345#comment-16943345
 ] 

jack commented on AIRFLOW-5500:
---

Can you PR the fix?

> Bug in trigger api endpoint 
> 
>
> Key: AIRFLOW-5500
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5500
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api
>Affects Versions: 1.10.1
>Reporter: Deavarajegowda M T
>Priority: Critical
> Attachments: 3level.py
>
>
> Unable to trigger workflow with nested sub dags, getting following error:
>  sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) duplicate key value 
> (dag_id,execution_date)=('dummy.task1.task_level1.task_level2','2019-09-10 
> 13:00:27+00:00') violates unique constraint 
> "dag_run_dag_id_execution_date_key"
>  trigger_dag for nested sub_dags is called twice.
>  
> fix:
> in airflow/api/common/experimental/trigger_dag.py -
> while populating subdags for a dag, each subdag's subdags is also populated 
> to main dag.
> So no need to repopulate subdags for each subdag separately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-3032) _pickle.UnpicklingError with using remote MySQL Server

2019-10-02 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943343#comment-16943343
 ] 

jack commented on AIRFLOW-3032:
---

Is this still an issue?

> _pickle.UnpicklingError with using remote MySQL Server
> --
>
> Key: AIRFLOW-3032
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3032
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: database
>Affects Versions: 1.9.0
>Reporter: Max Richter
>Priority: Blocker
> Attachments: error_log.txt, pip_list.txt
>
>
> Hello,
> I am running Airflow 1.9.0 successfully with a localhost MySQL database, 
> version 5.7.23.
> I switched sql_alchemy_conn = 
> mysql://airflow:@:3306/airflow in order to use the 
> proper MySQL server - same version 5.7.23.
> I created a dump from my local instance to the remote one.
> Issue:
>  * When tasks are executed by the scheduler everything runs fine, tasks are 
> executed and DB updated
>  * When manually triggering a task via the webserver, I am getting 
> "_pickle.UnpicklingError" please see error__log.txt for full log
> In the end, I only changed this one line in airflow.cfg which is causing that 
> I can not use it with a remote MySQL server.
>  
> Best,
> Max



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5121) Normalize *_conn_id parameters

2019-09-30 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941291#comment-16941291
 ] 

jack commented on AIRFLOW-5121:
---

ho.. didn't notice subtasks are opened as we go.. I'm used you guys generating 
the whole list in advanced :D

> Normalize *_conn_id parameters
> --
>
> Key: AIRFLOW-5121
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5121
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks, operators
>Affects Versions: 1.10.3
>Reporter: Kamil Bregula
>Priority: Major
>
> Hello,
>  
> There are many ways to pass the connection ID to hooks and operators. We 
> should introduce harmony in this matter.
> List of all possible combinations.
> {code:java}
> 217 gcp_conn_id=
> 40 aws_conn_id=
> 22 google_cloud_storage_conn_id=
> 21 bigquery_conn_id=
> 12 dingding_conn_id=
> 10 ssh_conn_id=
> 6 qubole_conn_id=
> 5 postgres_conn_id=
> 4 presto_conn_id=
> 4 http_conn_id=
> 3 jdbc_conn_id=
> 3 azure_data_lake_conn_id=
> 2 wasb_conn_id=
> 2 vertica_conn_id=
> 2 spark_conn_id=
> 2 snowflake_conn_id=
> 2 sftp_conn_id=
> 2 segment_conn_id=
> 2 redis_conn_id=
> 2 jira_conn_id=
> 2 imap_conn_id=
> 2 hive_cli_conn_id=
> 2 gcp_cloudsql_conn_id=
> 2 emr_conn_id=
> 2 druid_ingest_conn_id=
> 2 dest_gcs_conn_id=
> 2 datastore_conn_id=
> 2 databricks_conn_id=
> 2 azure_cosmos_conn_id=
> 1 sqlite_conn_id=
> 1 slack_conn_id=
> 1 samba_conn_id=
> 1 registry_conn_id=
> 1 pig_cli_conn_id=
> 1 oracle_conn_id=
> 1 opsgenie_conn_id=
> 1 mysql_conn_id=
> 1 mssql_conn_id=
> 1 metastore_conn_id=
> 1 grpc_conn_id=
> 1 druid_broker_conn_id=
> 1 docker_conn_id=
> 1 cloud_storage_conn_id=
> 1 ci_conn_id=
> {code}
>  List generated by following command:
> {code:java}
> grep -o -R '[a-z_]\+_conn_id=' --include '*_operator.py' airflow/ | cut -d 
> ":" -f 2 | sort | uniq -c |sort | tac
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5121) Normalize *_conn_id parameters

2019-09-30 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941207#comment-16941207
 ] 

jack commented on AIRFLOW-5121:
---

[~kamil.bregula]  PR was merged. can Jira be closed?

> Normalize *_conn_id parameters
> --
>
> Key: AIRFLOW-5121
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5121
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks, operators
>Affects Versions: 1.10.3
>Reporter: Kamil Bregula
>Priority: Major
>
> Hello,
>  
> There are many ways to pass the connection ID to hooks and operators. We 
> should introduce harmony in this matter.
> List of all possible combinations.
> {code:java}
> 217 gcp_conn_id=
> 40 aws_conn_id=
> 22 google_cloud_storage_conn_id=
> 21 bigquery_conn_id=
> 12 dingding_conn_id=
> 10 ssh_conn_id=
> 6 qubole_conn_id=
> 5 postgres_conn_id=
> 4 presto_conn_id=
> 4 http_conn_id=
> 3 jdbc_conn_id=
> 3 azure_data_lake_conn_id=
> 2 wasb_conn_id=
> 2 vertica_conn_id=
> 2 spark_conn_id=
> 2 snowflake_conn_id=
> 2 sftp_conn_id=
> 2 segment_conn_id=
> 2 redis_conn_id=
> 2 jira_conn_id=
> 2 imap_conn_id=
> 2 hive_cli_conn_id=
> 2 gcp_cloudsql_conn_id=
> 2 emr_conn_id=
> 2 druid_ingest_conn_id=
> 2 dest_gcs_conn_id=
> 2 datastore_conn_id=
> 2 databricks_conn_id=
> 2 azure_cosmos_conn_id=
> 1 sqlite_conn_id=
> 1 slack_conn_id=
> 1 samba_conn_id=
> 1 registry_conn_id=
> 1 pig_cli_conn_id=
> 1 oracle_conn_id=
> 1 opsgenie_conn_id=
> 1 mysql_conn_id=
> 1 mssql_conn_id=
> 1 metastore_conn_id=
> 1 grpc_conn_id=
> 1 druid_broker_conn_id=
> 1 docker_conn_id=
> 1 cloud_storage_conn_id=
> 1 ci_conn_id=
> {code}
>  List generated by following command:
> {code:java}
> grep -o -R '[a-z_]\+_conn_id=' --include '*_operator.py' airflow/ | cut -d 
> ":" -f 2 | sort | uniq -c |sort | tac
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5114) TypeError when running S3ToGoogleCloudStorageTransferOperator or GoogleCloudStorageToGoogleCloudStorageTransferOperator with default arguments

2019-09-30 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941214#comment-16941214
 ] 

jack commented on AIRFLOW-5114:
---

[~kamil.bregula]PR was merged. Jira can be closed

> TypeError when running S3ToGoogleCloudStorageTransferOperator or 
> GoogleCloudStorageToGoogleCloudStorageTransferOperator with default arguments
> --
>
> Key: AIRFLOW-5114
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5114
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, gcp
>Affects Versions: 1.10.3
>Reporter: Joel Croteau
>Assignee: Joel Croteau
>Priority: Major
>
> When running `S3ToGoogleCloudStorageTransferOperator` or 
> `GoogleCloudStorageToGoogleCloudStorageTransferOperator` with default 
> arguments, you get the following `TypeError`:
>  
> {noformat}
> [2019-08-05 04:13:19,873] {models.py:1796} ERROR - '>' not supported between 
> instances of 'NoneType' and 'int'
> Traceback (most recent call last)
>   File "/usr/local/lib/airflow/airflow/models.py", line 1664, in _run_raw_tas
> result = task_copy.execute(context=context
>   File "/home/airflow/gcs/dags/dependencies/gcp_transfer_operator.py", line 
> 675, in execut
> hook.wait_for_transfer_job(job, timeout=self.timeout
>   File "/home/airflow/gcs/dags/dependencies/gcp_api_base_hook.py", line 188, 
> in wrapper_decorato
> return func(self, *args, **kwargs
>   File "/home/airflow/gcs/dags/dependencies/gcp_transfer_hook.py", line 390, 
> in wait_for_transfer_jo
> while timeout > 0
> TypeError: '>' not supported between instances of 'NoneType' and 
> 'int{noformat}
> This is because both operators default `timeout` to `None`, and 
> `wait_for_transfer_job` assumes `timeout` is an integer. I have a fix I can 
> submit.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5118) Airflow DataprocClusterCreateOperator does not currently support setting optional components

2019-09-30 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941208#comment-16941208
 ] 

jack commented on AIRFLOW-5118:
---

PR was merged. Jira can be closed

> Airflow DataprocClusterCreateOperator does not currently support setting 
> optional components
> 
>
> Key: AIRFLOW-5118
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5118
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: operators
>Affects Versions: 1.10.3
>Reporter: Omid Vahdaty
>Assignee: Igor
>Priority: Minor
>
> there need to be an option to install optional components via 
> DataprocClusterCreateOperator . components such as zeppelin.
> From the source code of the DataprocClusterCreateOperator[1], the only 
> software configs that can be set are the imageVersion and the properties. As 
> the Zeppelin component needs to be set through softwareConfig 
> optionalComponents[2], the DataprocClusterCreateOperator does not currently 
> support setting optional components. 
>  
> As a workaround for the time being, you could create your clusters by 
> directly using the gcloud command rather than the 
> DataprocClusterCreateOperator . Using the Airflow BashOperator[4], you can 
> execute gcloud commands that create your Dataproc cluster with the required 
> optional components. 
> [1] 
> [https://github.com/apache/airflow/blob/master/airflow/contrib/operators/dataproc_operator.py]
>  
>  [2] 
> [https://cloud.google.com/dataproc/docs/reference/rest/v1/ClusterConfig#softwareconfig]
>  
> [3] [https://airflow.apache.org/howto/operator/bash.html] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5222) Dag run status is wrongly set to SUCCESS

2019-09-30 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941202#comment-16941202
 ] 

jack commented on AIRFLOW-5222:
---

I think the DAG state is determent by the last task status.

so in this case join is skipped which is why the DAG marked as success.

> Dag run status is wrongly set to SUCCESS
> 
>
> Key: AIRFLOW-5222
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5222
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: models
>Affects Versions: 1.10.3, 1.10.4
>Reporter: Deavarajegowda M T
>Priority: Minor
> Attachments: example_branch_operator.py, 
> image-2019-08-15-12-59-15-383.png
>
>
> !image-2019-08-15-12-59-15-383.png!
> In this particular case shouldn't  dag run status should be failed instead of 
> success.
> Looks like in airflow on end tasks status are checked to decide on dag run 
> status.
> root_ids = [t.task_id for t in dag.roots]
> roots = [t for t in tis if t.task_id in root_ids]
> # if all roots finished and at least one failed, the run failed
> if (not unfinished_tasks and
>  any(r.state in (State.FAILED, State.UPSTREAM_FAILED) for r in roots)):
>  self.log.info('Marking run %s failed', self)
>  self.set_state(State.FAILED)
>  dag.handle_callback(self, success=False, reason='task_failure',
>  session=session)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5231) S3Hook delete fails with over 1000 keys

2019-09-30 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941199#comment-16941199
 ] 

jack commented on AIRFLOW-5231:
---

In luigi it was solved with loop deleting 1000 files each time

[https://github.com/spotify/luigi/pull/2529]

[~feluelle] maybe you would be interested in this

 

> S3Hook delete fails with over 1000 keys
> ---
>
> Key: AIRFLOW-5231
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5231
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws, hooks
>Affects Versions: 2.0.0
>Reporter: Silviu Tantos
>Priority: Major
>
> Error raised:
> {noformat}
> botocore.exceptions.ClientError: An error occurred (MalformedXML) when 
> calling the DeleteObjects operation: The XML you provided was not well-formed 
> or did not validate against our published schema{noformat}
> See also: https://github.com/spotify/luigi/issues/2511



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5471) Fix docstring in GcpTransferServiceOperationsListOperator

2019-09-30 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941194#comment-16941194
 ] 

jack commented on AIRFLOW-5471:
---

[~ash] also merged and Jira open

> Fix docstring in GcpTransferServiceOperationsListOperator
> -
>
> Key: AIRFLOW-5471
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5471
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp
>Affects Versions: 1.10.5
>Reporter: Tobiasz Kedzierski
>Assignee: Tobiasz Kedzierski
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5484) PigCliHook has incorrect named parameter

2019-09-30 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941195#comment-16941195
 ] 

jack commented on AIRFLOW-5484:
---

[~ash] merged and Jira open

> PigCliHook has incorrect named parameter
> 
>
> Key: AIRFLOW-5484
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5484
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Affects Versions: 1.10.6
>Reporter: Jakob Homan
>Priority: Minor
>  Labels: ccoss2019, newbie
>
> When building the connection hook, we try to assign a variable named 
> `pig_conn_id`.  However, this doesn't exist, the correct name is 
> `pig_cli_conn_id`.  This will cause the correct config to not be picked up.
> airflow/models/connection.py:212
> {code:java}
> elif self.conn_type == 'pig_cli':
> from airflow.hooks.pig_hook import PigCliHook
> return PigCliHook(pig_conn_id=self.conn_id) {code}
> airflow/hooks/pig_hook.py:38
> {code:java}
> def __init__(
> self,
> pig_cli_conn_id="pig_cli_default"):
> conn = self.get_connection(pig_cli_conn_id)
> self.pig_properties = conn.extra_dejson.get('pig_properties', '')
> self.conn = conn {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5487) airflow/utils/strings.py: Fix unused warning variable

2019-09-30 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941179#comment-16941179
 ] 

jack commented on AIRFLOW-5487:
---

[~ash] PR was merged Jira is still open

> airflow/utils/strings.py: Fix unused warning variable
> -
>
> Key: AIRFLOW-5487
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5487
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: utils
>Affects Versions: 1.10.6
>Reporter: Jakob Homan
>Priority: Minor
>  Labels: ccoss2019, newbie
>
> Note: This ticket's being created to facilitate a new contributor's workshop 
> for Airflow. After the workshop has completed, I'll mark these all available 
> for anyone that might like to take them on.
> airflow/utils/strings.py:27
> {code:java}
> def get_random_string(length=8, choices=string.ascii_letters + string.digits):
> '''
> Generate random string
> '''
> return ''.join([choice(choices) for i in range(length)]) {code}
> Here we use an `i` for a placeholder variable, but this gets flagge as an 
> unused variable.  We can replace the `i` with `_` and avoid this warning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5490) security.py: Fix incorrect None comparison

2019-09-30 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941178#comment-16941178
 ] 

jack commented on AIRFLOW-5490:
---

[~ash] PR was merged. Jira can be closed

> security.py: Fix incorrect None comparison
> --
>
> Key: AIRFLOW-5490
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5490
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.10.6
>Reporter: Jakob Homan
>Priority: Minor
>  Labels: ccoss2019, newbie
>
> Note: This ticket's being created to facilitate a new contributor's workshop 
> for Airflow. After the workshop has completed, I'll mark these all available 
> for anyone that might like to take them on.
> In security.py, we twice use ``==`` with ``None``, which is [not 
> correct|https://stackoverflow.com/a/3257957]
> airflow/www/security.py:343
> {code:python}
> sqla_models.PermissionView.permission == None,  # noqa pylint: 
> disable=singleton-comparison
> sqla_models.PermissionView.view_menu == None,  # noqa pylint: 
> disable=singleton-comparison
> )) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5568) Add Hook / Operators for GCP Healthcare API

2019-09-30 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941171#comment-16941171
 ] 

jack commented on AIRFLOW-5568:
---

Can you please add also GCP component?

> Add Hook / Operators for GCP Healthcare API
> ---
>
> Key: AIRFLOW-5568
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5568
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: hooks, operators
>Affects Versions: 1.10.5
>Reporter: Jacob Ferriero
>Priority: Minor
>
> It'd be useful to have a hook for the healthcare api
> and some operators / sensor for the long running operations 
> (https://cloud.google.com/healthcare/docs/how-tos/long-running-operations)
>  * import / export of various formats
>  * deidentification of datasets
>  [https://cloud.google.com/healthcare/docs/apis]
>  
> Note this would be a good candidate to illustrate some sort of AysncOperator 
> described in AIRFLOW-5567



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5541) Exclude example tags from coverage

2019-09-30 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941175#comment-16941175
 ] 

jack commented on AIRFLOW-5541:
---

[~Urbaszek] can you close this? was covered in another pr/ticket

> Exclude example tags from coverage
> --
>
> Key: AIRFLOW-5541
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5541
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: tests
>Affects Versions: 1.10.5
>Reporter: Tomasz Urbaszek
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5566) Invalid deprecation warning for airflow home specification within config file

2019-09-30 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941172#comment-16941172
 ] 

jack commented on AIRFLOW-5566:
---

Can you PR a fix?

> Invalid deprecation warning for airflow home specification within config file
> -
>
> Key: AIRFLOW-5566
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5566
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: configuration
>Affects Versions: 1.10.5
>Reporter: Ian Friedman
>Priority: Minor
>
> [This deprecation 
> warning|https://github.com/apache/airflow/blob/4d4fda75333b1e6ae4e99b407ab2b1edc0d139d8/airflow/configuration.py#L551-L568]
>  is run regardless of whether an 'airflow_home' entry exists in the config 
> file when the 'AIRFLOW_HOME' environment variables are set.
> Traceback:
> [config entry check | 
> https://github.com/apache/airflow/blob/4d4fda75333b1e6ae4e99b407ab2b1edc0d139d8/airflow/configuration.py#L551]
>  -> 
> [config.has_option(...)|https://github.com/apache/airflow/blob/4d4fda75333b1e6ae4e99b407ab2b1edc0d139d8/airflow/configuration.py#L296-L304]
>  -> [get(...)| 
> https://github.com/apache/airflow/blob/4d4fda75333b1e6ae4e99b407ab2b1edc0d139d8/airflow/configuration.py#L214-L267]
> in get:
> {code:python}
> option = self._get_env_var_option(section, key)
> if option is not None:
> return option
> {code}
> 'get' returns True, then 'has_option' returns True, which triggers the warning



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-1940) s3_file_transform_operator does not work with boto3

2019-09-30 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941169#comment-16941169
 ] 

jack commented on AIRFLOW-1940:
---

[~ash] can this be closed?

> s3_file_transform_operator does not work with boto3
> ---
>
> Key: AIRFLOW-1940
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1940
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws
>Affects Versions: 1.9.0
>Reporter: Christian Petro
>Assignee: Christian Petro
>Priority: Major
> Fix For: 1.10.0
>
>
> s3_file_transform_operator is incompatible with the boto3 upgrade in 1.9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   >