[jira] [Commented] (AIRFLOW-32) Remove deprecated features prior to releasing Airflow 2.0

2018-12-23 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-32?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727965#comment-16727965
 ] 

jack commented on AIRFLOW-32:
-

All tasks in this ticket have been merged.

Can be closed?

> Remove deprecated features prior to releasing Airflow 2.0
> -
>
> Key: AIRFLOW-32
> URL: https://issues.apache.org/jira/browse/AIRFLOW-32
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Jeremiah Lowin
>Priority: Major
>  Labels: deprecated
> Fix For: 2.0.0
>
>
> A number of features have been marked for deprecation in Airflow 2.0. They 
> need to be deleted prior to release. 
> Usually the error message or comments will mention Airflow 2.0 with either a 
> #TODO or #FIXME.
> Tracking list (not necessarily complete!):
> JIRA:
> AIRFLOW-31
> AIRFLOW-200
> GitHub:
> https://github.com/airbnb/airflow/pull/1137/files#diff-1c2404a3a60f829127232842250ff406R233
> https://github.com/airbnb/airflow/pull/1219
> https://github.com/airbnb/airflow/pull/1285



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2319) Table "dag_run" has (bad) second index on (dag_id, execution_date)

2018-12-23 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727962#comment-16727962
 ] 

jack commented on AIRFLOW-2319:
---

[~akoeltringer] I think officialy there are only 3: SQlite, PostgreSQL and 
MySQL. As these are the only DBs being tested with travis.

> Table "dag_run" has (bad) second index on (dag_id, execution_date)
> --
>
> Key: AIRFLOW-2319
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2319
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Affects Versions: 1.9.0
>Reporter: Andreas Költringer
>Priority: Major
>
> Inserting DagRun's via {{airflow.api.common.experimental.trigger_dag}} 
> (multiple rows with the same {{(dag_id, execution_date)}}) raised the 
> following error:
> {code:java}
> {models.py:1644} ERROR - No row was found for one(){code}
> This is weird as the {{session.add()}} and {{session.commit()}} is right 
> before {{run.refresh_from_db()}} in {{models.DAG.create_dagrun()}}.
> Manually inspecting the database revealed that there is an extra index with 
> {{unique}} constraint on the columns {{(dag_id, execution_date)}}:
> {code:java}
> sqlite> .schema dag_run
> CREATE TABLE dag_run (
>     id INTEGER NOT NULL, 
>     dag_id VARCHAR(250), 
>     execution_date DATETIME, 
>     state VARCHAR(50), 
>     run_id VARCHAR(250), 
>     external_trigger BOOLEAN, conf BLOB, end_date DATETIME, start_date 
> DATETIME, 
>     PRIMARY KEY (id), 
>     UNIQUE (dag_id, execution_date), 
>     UNIQUE (dag_id, run_id), 
>     CHECK (external_trigger IN (0, 1))
> );
> CREATE INDEX dag_id_state ON dag_run (dag_id, state);{code}
> (On SQLite its a unique constraint, on MariaDB its also an index)
> The {{DagRun}} class in {{models.py}} does not reflect this, however it is in 
> [migrations/versions/1b38cef5b76e_add_dagrun.py|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/1b38cef5b76e_add_dagrun.py#L42]
> I looked for other migrations correting this, but could not find any. As this 
> is not reflected in the model, I guess this is a bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3045) Duplicate entry error with MySQL when update task_instances

2018-12-23 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727960#comment-16727960
 ] 

jack commented on AIRFLOW-3045:
---

Shouldn't have this break any MySQL back-end installed since 1.10.0 ?

> Duplicate entry error with MySQL when update task_instances
> ---
>
> Key: AIRFLOW-3045
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3045
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: database
>Affects Versions: 1.10.0
>Reporter: Haotian Wu
>Assignee: Haotian Wu
>Priority: Major
>
> h3. How to reproduce
> # Setup apach-airflow==1.10.0 with MySQL, bring up both webserver and 
> scheduler.
> # Add a DAG and it becomes running but none of the task will be actually 
> executed.
> # Manually trigger another run for the same dag, airflow scheduler will crash 
> with error {{sqlalchemy.exc.IntegrityError: 
> (_mysql_exceptions.IntegrityError) (1062, "Duplicate entry 
> 'xxx-yy--MM-DD ...' for key 'PRIMARY'")}}
> h3. The Reason
> In Airflow-1.10.0, execution_date field of task_instance is changed from 
> DateTime to Timestamp. However, in MySQL first Timestamp column in a table is 
> declared with {{ON UPDATE CURRENT_TIMESTAMP}} clause. Database in MySQL will 
> look like below after {{airflow initdb}}.
> | Field   | Type  | Null | Key | Default  | Extra 
> |
> | task_id | varchar(250)  | NO   | PRI | NULL |   
> |
> | dag_id  | varchar(250)  | NO   | PRI | NULL |   
> |
> | execution_date  | timestamp(6)  | NO   | PRI | CURRENT_TIMESTAMP(6) |  on 
> update CURRENT_TIMESTAMP(6)  |
> # When a task_instance is updated from state NULL to state "scheduled", its 
> execution_date is also reset to current timestamp automatically. 
> # task_instance is linked to a given dag_run by same execution_date, so 
> changed execution_date means task_instance is no longer linked to any known 
> dag_run.
> # If there are more than one dag_run for the same dag, multiple task_instance 
> with same  will be "unlinked" to their dag_run. Airflow 
> scheduler will try to update them to state NULL and thus try to update them 
> to the same  primary key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2609) Fix small issue with the BranchPythonOperator. It currently is skipping tasks it should not.

2018-12-23 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727957#comment-16727957
 ] 

jack commented on AIRFLOW-2609:
---

This ticket needs to be open. The PR wasn't merged yet:

https://github.com/apache/incubator-airflow/pull/3530

> Fix small issue with the BranchPythonOperator. It currently is skipping tasks 
> it should not.
> 
>
> Key: AIRFLOW-2609
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2609
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Reporter: Sandro Luck
>Assignee: Sandro Luck
>Priority: Minor
>
> Current behavior: When you Branch from A e.g. the BranchPythonOperator '->' 
> (B or C,), and you make some B '->' C as well. The current behavior is that C 
> will be skipped even though it's a downstream task of B. Wishes behavior only 
> skip downstream tasks which are not in the list of the branch_taken 
> downstream tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3047) HiveCliHook does not work properly with Beeline

2018-12-23 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727950#comment-16727950
 ] 

jack commented on AIRFLOW-3047:
---

[~vladglinskiy] can you submit PR for this?

> HiveCliHook does not work properly with Beeline
> ---
>
> Key: AIRFLOW-3047
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3047
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hive_hooks, hooks
>Affects Versions: 1.10.0
>Reporter: Vladislav Glinskiy
>Priority: Major
>
> Simple _HiveOperator_ does not work properly in the case when 
> _hive_cli_default_ connection configured to use _Beeline_.
>  
> *Steps to reproduce:* 
> 1. Setup Hive/HiveServer2 and Airflow environment with _beeline_ in _PATH_
> 2. Create test _datetimes_ table
> As example:
> {code:java}
> CREATE EXTERNAL TABLE datetimes (
> datetimes STRING)
> STORED AS PARQUET
> LOCATION '/opt/apps/datetimes';{code}
>  
> 3. Edit _hive_cli_default_ connection:
> {code:java}
> airflow connections --delete --conn_id hive_cli_default
> airflow connections --add --conn_id hive_cli_default --conn_type hive_cli 
> --conn_host $HOST --conn_port 1 --conn_schema default --conn_login 
> $CONN_LOGIN --conn_password $CONN_PASSWORD --conn_extra "{\"use_beeline\": 
> true, \"auth\": \"null;user=$HS_USER;password=$HS_PASSWORD\"}"
> {code}
> Set variables according to your environment.
>  
> 4. Create simple DAG:
> {code:java}
> """
> ###
> Sample DAG, which declares single Hive task.
> """
> import datetime
> import airflow
> from airflow import DAG
> from airflow.operators.hive_operator import HiveOperator
> from datetime import timedelta
> default_args = {
>   'owner': 'airflow',
>   'depends_on_past': False,
>   'start_date': airflow.utils.dates.days_ago(0, hour=0, minute=0, second=1),
>   'email': ['airf...@example.com'],
>   'email_on_failure': False,
>   'email_on_retry': False,
>   'retries': 1,
>   'retry_delay': timedelta(minutes=5),
>   'provide_context': True
> }
> dag = DAG(
> 'hive_task_dag',
> default_args=default_args,
> description='Single task DAG',
> schedule_interval=timedelta(minutes=15))
> insert_current_datetime = HiveOperator(
> task_id='insert_current_datetime_task',
> hql="insert into table datetimes values ('" + 
> datetime.datetime.now().strftime("%I:%M%p on %B %d, %Y") + "');",
> dag=dag)
> dag.doc_md = __doc__
> {code}
>  
> 5. Trigger DAG execution. Ensure that DAG completes successfully.
> 6. Check _datetimes_ table. It will be empty.
>  
> As it turned out the issue is caused by an invalid temporary script file. The 
> problem will be fixed if we add new-line character at the end of the script.
> So, a possible fix is to change:
> *hive_hooks.py:182*
> {code:java}
> if schema:
> hql = "USE {schema};\n{hql}".format(**locals())
> {code}
> to
> {code:java}
> if schema:
> hql = "USE {schema};\n{hql}\n".format(**locals())
> {code}
> Don't know how it can affect _hive shell_ queries since it is tested only 
> against _beeline_.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3243) UI task and dag clear feature cannot pick up dag parameters

2018-12-23 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727944#comment-16727944
 ] 

jack commented on AIRFLOW-3243:
---

I don't think the max_active_runs is enforced when clearing tasks.

> UI task and dag clear feature cannot pick up dag parameters
> ---
>
> Key: AIRFLOW-3243
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3243
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: chengningzhang
>Priority: Major
>
> Hi, 
>     I meet an issue with airflow UI dags and tasks "clear" feature. When I 
> clear the tasks from the UI, the dag parameters will not be picked up by the 
> the cleared tasks.
>     For example, I have "max_active_runs=1" in my dag parameter, but when I 
> manually clear the tasks, this parameter will not be picked up. The same 
> cleared tasks with different schedule time will run in parallel. 
>    Is there way we can improve this, as we may want to backfill some data and 
> just clear the past tasks from airflow UI. 
>  
> Thanks,
> Chengning



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3017) 404 error when opening log in the Web UI

2018-12-23 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727939#comment-16727939
 ] 

jack commented on AIRFLOW-3017:
---

This seems like a local issue on the old UI which is deprecated for 2.0.0

> 404 error when opening log in the Web UI
> 
>
> Key: AIRFLOW-3017
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3017
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: 1.10.0
>Reporter: Victor
>Priority: Major
>
> I opened the logs of one of the task of a DAG and saw the following error in 
> the console of my browser:
> GET https://AIRFLOW:8080/admin/admin/admin/js/form-1.0.0.js net::ERR_ABORTED 
> 404
> I suppose there is a typo somewhere in the code…



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2842) GCS rsync operator

2018-12-23 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727842#comment-16727842
 ] 

jack commented on AIRFLOW-2842:
---

[~dlamblin] This can be achieved with BashOperator but you can say this on 
everything.

In any case having operator for this can make life easier (you don't need to 
manage separated connection files with credentials etc.. ).

> GCS rsync operator
> --
>
> Key: AIRFLOW-2842
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2842
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Vikram Oberoi
>Priority: Major
>
> The GoogleCloudStorageToGoogleCloudStorageOperator supports copying objects 
> from one bucket to another using a wildcard.
> As long you don't delete anything in the source bucket, the destination 
> bucket will end up synchronized on every run.
> However, each object gets copied over even if it exists at the destination, 
> which makes this operation inefficient, time-consuming, and potentially 
> costly.
> I'd love an operator that behaves like `gsutil rsync` for when I need to 
> synchronize two buckets, supporting `gsutil rsync -d` behavior as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3288) Add SNS integration

2018-12-23 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727827#comment-16727827
 ] 

jack commented on AIRFLOW-3288:
---

[~ashb] This was merged. The ticket can be closed.

> Add SNS integration
> ---
>
> Key: AIRFLOW-3288
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3288
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Reporter: Szymon Bilinski
>Assignee: Szymon Bilinski
>Priority: Major
>
> I'd like to propose a new {{contrib}} hook and a basic operator for 
> publishing *Amazon SNS* notifications.
> Motivation: 
> - Useful for integrating various Amazon services and pretty general in 
> nature: 
> -- AWS SQS
> -- AWS Lambda
> -- E-mail
> -- ... 
> - A similar functionality already 
> [exists|https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/example_dags/example_pubsub_flow.py]
>  for GCP (i.e. Pub/Sub integration).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3356) Scheduler gets stuck for certain DAGs

2018-12-22 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727825#comment-16727825
 ] 

jack commented on AIRFLOW-3356:
---

I'm having the same issue also on 1.9

Some tasks are stuck on running even though they are not (tasks that takes 1-4 
minutes to execute get stuck for hours)... Only clearing them solve this and 
then they are re-scheduled.

> Scheduler gets stuck for certain DAGs
> -
>
> Key: AIRFLOW-3356
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3356
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.9.0
>Reporter: John Smodic
>Priority: Critical
>
> I observe the scheduler getting stuck for certain DAGs:
> Nov 15 19:11:48 ip-172-16-13-120 python3.6[1319]: File Path PID Runtime Last 
> Runtime Last Run
> Nov 15 19:11:48 ip-172-16-13-120 python3.6[1319]: 
> /home/ubuntu/airflow/dags/stuck_dag.py 14241 *19977.55s* 1.05s 
> 2018-11-15T13:38:47
> Nov 15 19:11:48 ip-172-16-13-120 python3.6[1319]: 
> /home/ubuntu/airflow/dags/not_stuck_dag.py 19906 0.05s 1.05s 
> 2018-11-15T19:11:44
>  
> The "Runtime" of the stuck DAG's scheduling process is huge and I can't tell 
> what it's doing. There's no mention of that DAG in the scheduler logs 
> otherwise.
>  
> The mapped process looks like this:
> ubuntu 14241 0.0 0.3 371132 63232 ? S 13:38 0:00 /usr/bin/python3.6 
> /usr/local/bin/airflow scheduler
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3426) Correct references to Python version tested (3.4 -> 3.5)

2018-12-20 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725867#comment-16725867
 ] 

jack commented on AIRFLOW-3426:
---

This was merged. Ticket can be closed.

> Correct references to Python version tested (3.4 -> 3.5)
> 
>
> Key: AIRFLOW-3426
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3426
> Project: Apache Airflow
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 1.9.0, 1.10.0, 1.10.1
> Environment: All
>Reporter: Bryant Biggs
>Assignee: Bryant Biggs
>Priority: Trivial
>  Labels: documentation, easyfix, newbie
> Fix For: 1.10.1
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The current CI tests on Travis use Python version 2.7 and 3.5, however 
> throughout the documentation there are still references to using/supporting 
> 3.4. To better match what is actually supported, the 3.4 references should be 
> replaced with what is actually being tested, 3.5



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3254) BigQueryGetDataOperator to support reading query from SQL file

2018-12-20 Thread jack (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jack updated AIRFLOW-3254:
--
Affects Version/s: (was: 1.10.0)
   1.10.1

> BigQueryGetDataOperator to support reading query from SQL file
> --
>
> Key: AIRFLOW-3254
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3254
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: 1.10.1
>Reporter: jack
>Assignee: Kaxil Naik
>Priority: Minor
> Fix For: 1.10.2
>
>
> As discussed with [~Fokko] on Slack:
> Currently the BigQueryGetDataOperator supports only reading query provided 
> directly as:
>  
> {code:java}
> sql = 'SELECT ID FROM TABLE'
> {code}
>  
> it does not support reading the query from a SQL file which can be annoying 
> as sometimes queries are quite large.
> This behavior is supported by other operators like 
> MySqlToGoogleCloudStorageOperator:
> dag = DAG(
>     dag_id='Import',
>     default_args=args,
>     schedule_interval='*/5 * * * *',
>     max_active_runs=1,
>     catchup=False,
>     template_searchpath = ['/home/.../airflow/…/sql/Import']
> )
>  
> importop = MySqlToGoogleCloudStorageOperator(
>     task_id='import',
>     mysql_conn_id='MySQL_con',
>     google_cloud_storage_conn_id='gcp_con',
>     provide_context=True,
>     sql = 'importop.sql',
>     params=\{'table_name' : TABLE_NAME},
>     bucket=GCS_BUCKET_ID,
>     filename=file_name_orders,
>     dag=dag)
>  
> If anyone can pick it up it would be great :)
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2143) Try number displays incorrect values in the web UI

2018-12-20 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725686#comment-16725686
 ] 

jack commented on AIRFLOW-2143:
---

I see this also

> Try number displays incorrect values in the web UI
> --
>
> Key: AIRFLOW-2143
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2143
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: James Davidheiser
>Priority: Minor
> Attachments: adhoc_query.png, task_instance_page.png
>
>
> This was confusing us a lot in our task runs - in the database, a task that 
> ran is marked as 1 try.  However, when we view it in the UI, it shows at 2 
> tries in several places.  These include:
>  * Task Instance Details (ie 
> [https://airflow/task?execution_date=xxx_id=xxx_id=xxx 
> )|https://airflow/task?execution_date=xxx_id=xxx_id=xxx]
>  * Task instance browser (/admin/taskinstance/)
>  * Task Tries graph (/admin/airflow/tries)
> Notably, is is correctly shown as 1 try in the log filenames, on the log 
> viewer page (admin/airflow/log?execution_date=), and some other places.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-644) Issue with past runs when using starttime as datetime.now()

2018-12-20 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725680#comment-16725680
 ] 

jack commented on AIRFLOW-644:
--

It's a bad practice to do 'start_date': datetime.now(),```

> Issue with past runs when using starttime as datetime.now()
> ---
>
> Key: AIRFLOW-644
> URL: https://issues.apache.org/jira/browse/AIRFLOW-644
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Reporter: Puneeth Potu
>Priority: Major
>
> Hi, we used the following snippet in the dag parameters
> ```default_args = {
> 'owner': 'dwh',
> 'depends_on_past': True,
> 'wait_for_downstream': True,
> 'start_date': datetime.now(),```
> When used datetime.now() along with frequency as @daily I see the last 5 runs 
> in my graph view, and the dag status of  all the previous runs is "FAILED"
> When used datetime.now() along with frequency as @monthly I see the last 14 
> runs in my graph view, and the dag status of  all the previous runs is 
> "FAILED"
> When used datetime.now() along with frequency as @weekly I see the last 53 
> runs in my graph view, and the dag status of  all the previous runs is 
> "FAILED"
> For monthly and weekly it is not showing either the current week or month. I 
> activated my Dags today (11/22/2016).
> I see weekly runs populated from (2015-11-15 to 2016-11-13), and I don't see 
> 2016-11-20 which is the latest.
> I see Monthly runs populated from (2015-09-01 to 2016-10-01) and I don't see 
> 2016-11-01 which is the latest.
> Please, advise if this is the expected behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-549) Scheduler child logs are created out of normal location

2018-12-20 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725682#comment-16725682
 ] 

jack commented on AIRFLOW-549:
--

[~bolke] is this still an issue?

> Scheduler child logs are created out of normal location
> ---
>
> Key: AIRFLOW-549
> URL: https://issues.apache.org/jira/browse/AIRFLOW-549
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
>Assignee: Paul Yang
>Priority: Major
>
> The new scheduler has childs logging in their own log file. The location of 
> the log files are set outside of the cli configurable locations making it 
> inconsistent with other log configurations in airflow. In addition the log 
> files are by default created in /tmp which is a non standard location for log 
> files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1288) Bad owners field in DAGs breaks Airflow front page

2018-12-19 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725610#comment-16725610
 ] 

jack commented on AIRFLOW-1288:
---

Did you set your owner to be a list instead of string?

> Bad owners field in DAGs breaks Airflow front page
> --
>
> Key: AIRFLOW-1288
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1288
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Reporter: Dan Davydov
>Priority: Major
>
> DAGs that have owners set to a bad value break the front page of the 
> webserver with an error like below. Instead these should just cause import 
> errors for the specific dags in question.
> {code}
> Ooops.
>   / (  ()   )  \___
>  /( (  (  )   _))  )   )\
>(( (   )()  )   (   )  )
>  ((/  ( _(   )   (   _) ) (  () )  )
> ( (  ( (_)   (((   )  .((_ ) .  )_
>( (  )(  (  ))   ) . ) (   )
>   (  (   (  (   ) (  _  ( _) ).  ) . ) ) ( )
>   ( (  (   ) (  )   (  )) ) _)(   )  )  )
>  ( (  ( \ ) ((_  ( ) ( )  )   ) )  )) ( )
>   (  (   (  (   (_ ( ) ( _)  ) (  )  )   )
>  ( (  ( (  (  ) (_  )  ) )  _)   ) _( ( )
>   ((  (   )(( _)   _) _(_ (  (_ )
>(_((__(_(__(( ( ( |  ) ) ) )_))__))_)___)
>((__)\\||lll|l||///  \_))
> (   /(/ (  )  ) )\   )
>   (( ( ( | | ) ) )\   )
>(   /(| / ( )) ) ) )) )
>  ( ( _(|)_) )
>   (  ||\(|(|)|/|| )
> (|(||(||))
>   ( //|/l|||)|\\ \ )
> (/ / //  /|//\\  \ \  \ _)
> ---
> Node: i-0dbbddfb63fb2cfbc.inst.aws.airbnb.com
> ---
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1817, in 
> wsgi_app
> response = self.full_dispatch_request()
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1477, in 
> full_dispatch_request
> rv = self.handle_user_exception(e)
>   File 
> "/usr/local/lib/python2.7/dist-packages/newrelic/hooks/framework_flask.py", 
> line 103, in _nr_wrapper_Flask_handle_exception_
> return wrapped(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1381, in 
> handle_user_exception
> reraise(exc_type, exc_value, tb)
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1475, in 
> full_dispatch_request
> rv = self.dispatch_request()
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1461, in 
> dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
>   File 
> "/usr/local/lib/python2.7/dist-packages/newrelic/hooks/framework_flask.py", 
> line 40, in _nr_wrapper_handler_
> return wrapped(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 68, 
> in inner
> return self._run_view(f, *args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 
> 367, in _run_view
> return fn(self, *args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/flask_login.py", line 758, in 
> decorated_view
> return func(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/www/views.py", line 
> 1909, in index
> all_dag_ids=all_dag_ids)
>   File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 
> 307, in render
> return render_template(template, **kwargs)
>   File 
> "/usr/local/lib/python2.7/dist-packages/newrelic/api/function_trace.py", line 
> 110, in literal_wrapper
> return wrapped(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/flask/templating.py", line 
> 128, in render_template
> context, ctx.app)
>   File "/usr/local/lib/python2.7/dist-packages/flask/templating.py", line 
> 110, in _render
> rv = template.render(context)
>   File 
> "/usr/local/lib/python2.7/dist-packages/newrelic/api/function_trace.py", line 
> 98, in dynamic_wrapper
> return wrapped(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 
> 989, in render
> return self.environment.handle_exception(exc_info, True)
>   File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 
> 754, in handle_exception
>  

[jira] [Commented] (AIRFLOW-1322) Cannot mark task as success

2018-12-19 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725607#comment-16725607
 ] 

jack commented on AIRFLOW-1322:
---

[~Fokko] is this still an issue?

> Cannot mark task as success
> ---
>
> Key: AIRFLOW-1322
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1322
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Fokko Driesprong
>Priority: Major
>
> Hi guys,
> I've noticed when I trigger a new job using the UI, I'm not able to `Mark 
> Successful`. When I trigger a job using the cli, this option does appear. I 
> have the feeling that the jobs are not properly created when a job is 
> triggered using the UI.
> I want to look into it when I have more time.
> Cheers, Fokko



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1362) Paused dag restarted on upgrading airflow from 1.8.0 to 1.8.1

2018-12-19 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725609#comment-16725609
 ] 

jack commented on AIRFLOW-1362:
---

Did it happen when you perform upgrade in your test environment?

> Paused dag restarted on upgrading airflow from 1.8.0 to 1.8.1
> -
>
> Key: AIRFLOW-1362
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1362
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: user_airflow
>Priority: Major
>
> Recently we upgraded airflow from 1.8.0 to 1.8.1. The upgrade went fine but 
> once i restarted the web server and scheduler, all paused dags restarted 
> automatically and started running multiple runs from the date they have been 
> stopped. It messed up most of user data and we need to clean up data 
> manually. How can we prevent this happening in future upgrades?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3538) BigQueryToCloudStorageOperator to support bucket

2018-12-18 Thread jack (JIRA)
jack created AIRFLOW-3538:
-

 Summary: BigQueryToCloudStorageOperator to support bucket
 Key: AIRFLOW-3538
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3538
 Project: Apache Airflow
  Issue Type: Improvement
  Components: gcp, operators
Affects Versions: 2.0.0
Reporter: jack


Currently
{code:java}
BigQueryToCloudStorageOperator  {code}
does not support bucket property.

 

This means that
{code:java}
destination_cloud_storage_uris {code}
needs the full path e.g
{code:java}
gs://bucket{code}
 

This behavior isn't consistent with other operators like:

GoogleCloudStorageDownloadOperator

MySqlToGoogleCloudStorageOperator

GoogleCloudStorageToBigQueryOperator

 

and many more... all support bucket and don't require the user to define the 
full path but only relative path after the bucket name.

 

Consistency is important because when using many different operators if they 
have similar templates it reduce the need to define specific variables.

For example in my case I must have two variables

a - gs://bucket_name/folder

b - folder   (bucket_name already defined in the bucket property).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-698) dag_run "scheduled" property should be it's own DB column

2018-12-17 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722792#comment-16722792
 ] 

jack commented on AIRFLOW-698:
--

How setting up a new column resolve this ?

> dag_run "scheduled" property should be it's own DB column
> -
>
> Key: AIRFLOW-698
> URL: https://issues.apache.org/jira/browse/AIRFLOW-698
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Dan Davydov
>Priority: Major
>  Labels: beginner, starter
>
> The airflow schedule only executes dag_runs that have a run_id that start 
> with "scheduled__". This can be very confusing, especially when manually 
> creating a dagrun and forgetting the "scheduled__" prefix. The "scheduled" 
> part should be pulled into a separate column so that it is very clear in the 
> UI that a user is creating a DAG that isn't scheduled.
> cc [~maxime.beauche...@apache.org]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-953) Tasks marked as successful should have complete state set

2018-12-17 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722790#comment-16722790
 ] 

jack commented on AIRFLOW-953:
--

Why duration should be 0? It logs the time it took the task to be completed.

> Tasks marked as successful should have complete state set
> -
>
> Key: AIRFLOW-953
> URL: https://issues.apache.org/jira/browse/AIRFLOW-953
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Reporter: Dan Davydov
>Priority: Major
>  Labels: beginner, starter
>
> Tasks marked as successful should have start/end_date/duration/operator set . 
> start/end dates should be the same and denote the time the task was marked as 
> successful, duration should be 0, and the operator should be filled in 
> correctly with the task's operator.
> This should be fixed because otherwise the task instance state is not 
> complete which could break some operations in Airflow, and prevents things 
> like scripts that delete old tasks from Airflow (since e.g. start_date is not 
> specified for these tasks).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2660) Deadlock error from SQLAlchemy when the parallelism is increased.

2018-12-17 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722777#comment-16722777
 ] 

jack commented on AIRFLOW-2660:
---

I suspect that this is a known MySQL issue:

[https://stackoverflow.com/questions/41015813/avoiding-mysql-deadlock-when-upgrading-shared-to-exclusive-lock]

 

 

 

> Deadlock error from SQLAlchemy when the parallelism is increased.
> -
>
> Key: AIRFLOW-2660
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2660
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Affects Versions: 1.9.0
> Environment: RHEL 7
>Reporter: Vignesh
>Priority: Minor
>  Labels: patch
> Fix For: 1.9.0
>
>
> Faced the following errors when running a multi worker distributed setup of 
> airflow 1.9. These error occurred when the number of scheduler threads, 
> parallelism and dag_concurrency were increased in the configuration.
>  
> Original exception was: (_mysql_exceptions.OperationalError) (1213, 'Deadlock 
> found when trying to get lock; try restarting transaction') [SQL: u'UPDATE 
> task_instance SET state=%s WHERE task_instance.task_id = %s AND 
> task_instance.dag_id = %s AND task_instance.execution_date = %s'] 
> [parameters: (u'queued', 'join', 'parent_dag.sub_dag', 
> datetime.datetime(2018, 6, 18, 7, 1, 44))] (Background on this error at: 
> [http://sqlalche.me/e/e3q8])
>  
> sqlalchemy.exc.InvalidRequestError: This Session's transaction has been 
> rolled back due to a previous exception during flush. To begin a new 
> transaction with this Session, first issue Session.rollback(). Original 
> exception was: (_mysql_exceptions.OperationalError) (1213, 'Deadlock found 
> when trying to get lock; try restarting transaction') [SQL: u'UPDATE 
> task_instance SET state=%s WHERE task_instance.task_id = %s AND 
> task_instance.dag_id = %s AND task_instance.execution_date = %s'] 
> [parameters: (u'queued', 'join_0', 'parent_dag.sub_dag', 
> datetime.datetime(2018, 6, 18, 7, 1, 44))] (Background on this error at: 
> [http://sqlalche.me/e/e3q8])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3254) BigQueryGetDataOperator to support reading query from SQL file

2018-12-16 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722503#comment-16722503
 ] 

jack commented on AIRFLOW-3254:
---

[~kaxilnaik] any chance you are working on this for the 1.10.2?

> BigQueryGetDataOperator to support reading query from SQL file
> --
>
> Key: AIRFLOW-3254
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3254
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: jack
>Assignee: Kaxil Naik
>Priority: Minor
>
> As discussed with [~Fokko] on Slack:
> Currently the BigQueryGetDataOperator supports only reading query provided 
> directly as:
>  
> {code:java}
> sql = 'SELECT ID FROM TABLE'
> {code}
>  
> it does not support reading the query from a SQL file which can be annoying 
> as sometimes queries are quite large.
> This behavior is supported by other operators like 
> MySqlToGoogleCloudStorageOperator:
> dag = DAG(
>     dag_id='Import',
>     default_args=args,
>     schedule_interval='*/5 * * * *',
>     max_active_runs=1,
>     catchup=False,
>     template_searchpath = ['/home/.../airflow/…/sql/Import']
> )
>  
> importop = MySqlToGoogleCloudStorageOperator(
>     task_id='import',
>     mysql_conn_id='MySQL_con',
>     google_cloud_storage_conn_id='gcp_con',
>     provide_context=True,
>     sql = 'importop.sql',
>     params=\{'table_name' : TABLE_NAME},
>     bucket=GCS_BUCKET_ID,
>     filename=file_name_orders,
>     dag=dag)
>  
> If anyone can pick it up it would be great :)
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-490) Scope should be auto-added to the connection per GCP hook

2018-12-12 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719850#comment-16719850
 ] 

jack commented on AIRFLOW-490:
--

After checking... this was already implemented:

https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/gcp_api_base_hook.py#L32

> Scope should be auto-added to the connection per GCP hook
> -
>
> Key: AIRFLOW-490
> URL: https://issues.apache.org/jira/browse/AIRFLOW-490
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
>
> The GCP hooks should auto add the scope they need to work to the GCP 
> connection so a user don't need specify the obvious scopes, like:
> https://www.googleapis.com/auth/cloud-platform
> We should keep the scope field for extra scopes requested for special cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1536) DaemonContext uses default umask 0

2018-12-12 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719251#comment-16719251
 ] 

jack commented on AIRFLOW-1536:
---

this could be a venerability though I'm not sure it's urgent to fix.

> DaemonContext uses default umask 0
> --
>
> Key: AIRFLOW-1536
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1536
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli, security
>Reporter: Timothy O'Keefe
>Priority: Major
>
> All DaemonContext instances used for worker, scheduler, webserver, flower, 
> etc. do not supply a umask argument. See here for example:
> https://github.com/apache/incubator-airflow/blob/b0669b532a7be9aa34a4390951deaa25897c62e6/airflow/bin/cli.py#L869
> As a result, the DaemonContext will use the default umask=0 which leaves user 
> data exposed. A BashOperator for example that writes any files would have 
> permissions rw-rw-rw- as would any airflow logs.
> I believe the umask should either be configurable, or inherited from the 
> parent shell, or both.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3210) Changing defaults types in BigQuery Hook break BigQuery operator

2018-12-12 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719224#comment-16719224
 ] 

jack commented on AIRFLOW-3210:
---

I think it was solved by https://github.com/apache/incubator-airflow/pull/4274

> Changing defaults types in BigQuery Hook break BigQuery operator
> 
>
> Key: AIRFLOW-3210
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3210
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, gcp
>Reporter: Sergei Guschin
>Priority: Major
>
> Changes in BigQuery Hook break BigQuery operator run_query() and all DAGs 
> which accommodate current type (Boolean or value):
> [BigQuery operator 
> set|https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/bigquery_operator.py#L115-L121]:
> destination_dataset_table=False,
> udf_config=False,
> [New BigQuery hook 
> expects|https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/bigquery_hook.py#L645-L650]:
> (udf_config, 'userDefinedFunctionResources', None, list),
> (destination_dataset_table, 'destinationTable', None, dict),



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-353) when multiple tasks removed update state fails

2018-12-12 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719230#comment-16719230
 ] 

jack commented on AIRFLOW-353:
--

Did [https://github.com/apache/incubator-airflow/pull/1675] resolve this issue?

> when multiple tasks removed update state fails
> --
>
> Key: AIRFLOW-353
> URL: https://issues.apache.org/jira/browse/AIRFLOW-353
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Reporter: Yiqing Jin
>Assignee: Yiqing Jin
>Priority: Major
>
> if multiple tasks gets removed during dag run update_state may not work 
> properly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-352) filter_by_owner is not working when use ldap authentication

2018-12-12 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719226#comment-16719226
 ] 

jack commented on AIRFLOW-352:
--

Duplicate of https://issues.apache.org/jira/browse/AIRFLOW-1552

> filter_by_owner is not working when use ldap authentication
> ---
>
> Key: AIRFLOW-352
> URL: https://issues.apache.org/jira/browse/AIRFLOW-352
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, security, webserver
>Affects Versions: 1.7.1.3
> Environment: ubuntu 14.04 LTS ,  ldap without encryption 
>Reporter: peter pang
>Priority: Major
>  Labels: security
>
> I set airflow.cfg as follows:
> {noformat}
> [webserver]
> filter_by_owner = True
> authenticate = TRUE
> auth_backend = airflow.contrib.auth.backends.ldap_auth
> [ldap]
> uri = ldap://xx.xx.xx.xx
> user_filter = objectClass=*
> user_name_attr = uid
> superuser_filter = 
> memberOf=CN=airflow-super-users,OU=Groups,OU=RWC,OU=US,OU=NORAM,DC=example,DC=com
> data_profiler_filter = 
> memberOf=CN=airflow-data-profilers,OU=Groups,OU=RWC,OU=US,OU=NORAM,DC=example,DC=com
> bind_user = cn=admin,dc=example,dc=com
> bind_password = secret
> basedn = dc=example,dc=com
> cacert = /etc/ca/ldap_ca.crt
> search_scope=SUBTREE
> {noformat}
> then I run the webUI , and I can login with superuser and data_profiler user. 
> But after login with data profiler user, entered the data profiler user home 
> view , there's no dags listed with the same dag owner. It seems the  
> filter_by_owner setting is not working.
> Debug into the views.py --> class HomeView(AdminIndexView):
> {color:red}current_user.username{color} always get{color:red} "None"{color}. 
> It seems we can't get username directly.
> so , continue debug into the ldap_auth.py --> class LdapUser(models.User):
> I added a method to return username   
> {code}
>  def get_username(self):
> return self.user.username
> {code}
> then back to view.py  , replace 'current_user.username' to 
> {color:red}'current_user.get_username()'{color} , the user filter can work 
> now!
> I don't know exactly why, but the modification can work...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-490) Scope should be auto-added to the connection per GCP hook

2018-12-12 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719220#comment-16719220
 ] 

jack commented on AIRFLOW-490:
--

Sounds like a reasonable idea

> Scope should be auto-added to the connection per GCP hook
> -
>
> Key: AIRFLOW-490
> URL: https://issues.apache.org/jira/browse/AIRFLOW-490
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
>
> The GCP hooks should auto add the scope they need to work to the GCP 
> connection so a user don't need specify the obvious scopes, like:
> https://www.googleapis.com/auth/cloud-platform
> We should keep the scope field for extra scopes requested for special cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-473) Allow the usage of the same connection "service key" for GCP DataFlow

2018-12-12 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719216#comment-16719216
 ] 

jack commented on AIRFLOW-473:
--

Is this still an issue?

> Allow the usage of the same connection "service key" for GCP DataFlow
> -
>
> Key: AIRFLOW-473
> URL: https://issues.apache.org/jira/browse/AIRFLOW-473
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
>
> We need a way to pass the service key to the Cloud DataFlow jar file on 
> places where the local service account is not available (for example in a 
> kubernetes cluster).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-691) Add SSH keepalive option to ssh_hook

2018-12-12 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719213#comment-16719213
 ] 

jack commented on AIRFLOW-691:
--

Fixed with https://github.com/apache/incubator-airflow/pull/1937

> Add SSH keepalive option to ssh_hook
> 
>
> Key: AIRFLOW-691
> URL: https://issues.apache.org/jira/browse/AIRFLOW-691
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib, hooks
>Reporter: Daniel van der Ende
>Assignee: Daniel van der Ende
>Priority: Minor
>
> In situations with long running commands that are executed via the SSH_hook, 
> it is necessary to set the SSH keep alive option, with a corresponding 
> interval at which to ensure the connection stays alive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3419) S3_hook.select_key is broken on Python3

2018-12-12 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719209#comment-16719209
 ] 

jack commented on AIRFLOW-3419:
---

I'm not sure this is the reason... the decode is the same for Python 3 and 
Python 2.7

> S3_hook.select_key is broken on Python3
> ---
>
> Key: AIRFLOW-3419
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3419
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: boto3, hooks
>Affects Versions: 1.10.1
>Reporter: Maria Rebelka
>Priority: Major
>
> Hello,
> Using select_key throws an error:
> {quote}text = S3Hook('aws_conn').select_key(key='my_key',
>                                      bucket_name='my_bucket',
>                                      expression='SELECT * FROM S3Object s',
>                                      expression_type='SQL',
>                                      input_serialization={'JSON': \{'Type': 
> 'DOCUMENT'}},
>                                      output_serialization={'JSON': {}}){quote}
> Traceback (most recent call last):
> {quote}   File "db.py", line 31, in 
> output_serialization={'JSON': {}})
>   File "/usr/local/lib/python3.4/site-packages/airflow/hooks/S3_hook.py", 
> line 262, in select_key
> for event in response['Payload']
> TypeError: sequence item 0: expected str instance, bytes found{quote}
> Seems that the problem is in this line:
> S3_hook.py, line 262:  return ''.join(event['Records']['Payload']
> which probably should be return 
> ''.join(event['Records']['Payload'].decode('utf-8')
> From example in Amazon blog:
> https://aws.amazon.com/blogs/aws/s3-glacier-select/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3499) Add flag to Opeators to write the Render to the log.

2018-12-11 Thread jack (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jack updated AIRFLOW-3499:
--
Description: 
*Motivation:*

I have few operators who uses Variable. The variable is updated consistently 
and gets overwritten with more recent data.

Say I have this operator:

 
{code:java}
NEXT_ORDER_ID= Variable.get("next_order_id_to_import")
import_orders = MySqlToGoogleCloudStorageOperator(
    task_id='import',
    mysql_conn_id='c_production',
    google_cloud_storage_conn_id='gcp_m',
    approx_max_file_size_bytes = 1, 
    sql = 'Select … from … where orders_id between {{ params.next_order_id}} 
and {{ ti.xcom_pull('max_order_id') }}',
    params={'next_order_id_to_import': NEXT_ORDER_ID},
    bucket=GCS_BUCKET_ID,
    filename=file_name_orders_products,
    dag=dag)
{code}
 

 

 

The problem is that I can not see the parameters for this query on the render 
page.

In fact +there is no way of knowing what was the actual query that the task 
executed+.

 

To be exact : 
{code:java}
 {{ ti.xcom_pull('max_order_id') }}{code}
 - Saved in Database so it always show the correct one per DAG.

 
{code:java}
{{ params.next_order_id}} {code}
 - Will always show the most recent value as this is not DAG parameter and not 
saved to the database. When clicking on the render page it goes to the variable 
and take the value from there, regardless if this was the value during the run 
or not.

 

 

*Suggested Solution:*

Since it's unlikely that the Render tab will be change (as my use case could be 
different than how other use it) the best solution is simply to allow to write 
the Render as it was during the execution of the task to the task log. This 
will help to traceback issues.

 

Basically add to all operators (Base Operator?) a flag :
{code:java}
write_render_to_log {code}
which default is False. If this flag set to true than the render content will 
be flushed to the log of the task.

 

  was:
*Motivation:*

I have few operators who uses Variable. The variable is updated consistently 
and gets overwritten with more recent data.

Say I have this operator:

 
{code:java}
NEXT_ORDER_ID= Variable.get("next_order_id_to_import")
import_orders = MySqlToGoogleCloudStorageOperator(
    task_id='import',
    mysql_conn_id='c_production',
    google_cloud_storage_conn_id='gcp_m',
    approx_max_file_size_bytes = 1, 
    sql = 'Select … from … where orders_id between {{ params.next_order_id}} 
and {{ ti.xcom_pull('max_order_id') }}',
    params={'next_order_id_to_import': NEXT_ORDER_ID},
    bucket=GCS_BUCKET_ID,
    filename=file_name_orders_products,
    dag=dag)
{code}
 

 

 

The problem is that If I can not see the parameters for this query on the 
render page.

To be exact : 
{code:java}
 {{ ti.xcom_pull('max_order_id') }}{code}
- Saved in Database so it always show the correct one per DAG.

 
{code:java}
{{ params.next_order_id}} {code}
- Will always show the most recent value as this is not DAG parameter and not 
saved to the database. When clicking on the render page it goes to the variable 
and take the value from there, regardless if this was the value during the run 
or not.

 

 

*Suggested Solution:*

Since it's unlikely that the Render tab will be change (as my use case could be 
different than how other use it) the best solution is simply to allow to write 
the Render as it was during the execution of the task to the task log. This 
will help to traceback issues.

 

Basically add to all operators (Base Operator?) a flag :
{code:java}
write_render_to_log {code}
which default is False. If this flag set to true than the render content will 
be flushed to the log of the task.

 


> Add flag to Opeators to write the Render to the log.
> 
>
> Key: AIRFLOW-3499
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3499
> Project: Apache Airflow
>  Issue Type: Wish
>Affects Versions: 2.0.0
>Reporter: jack
>Priority: Major
>
> *Motivation:*
> I have few operators who uses Variable. The variable is updated consistently 
> and gets overwritten with more recent data.
> Say I have this operator:
>  
> {code:java}
> NEXT_ORDER_ID= Variable.get("next_order_id_to_import")
> import_orders = MySqlToGoogleCloudStorageOperator(
>     task_id='import',
>     mysql_conn_id='c_production',
>     google_cloud_storage_conn_id='gcp_m',
>     approx_max_file_size_bytes = 1, 
>     sql = 'Select … from … where orders_id between {{ params.next_order_id}} 
> and {{ ti.xcom_pull('max_order_id') }}',
>     params={'next_order_id_to_import': NEXT_ORDER_ID},
>     bucket=GCS_BUCKET_ID,
>     filename=file_name_orders_products,
>     dag=dag)
> {code}
>  
>  
>  
> The problem is that I can not see the parameters for this query on the render 
> page.
> In fact +there is no way of 

[jira] [Created] (AIRFLOW-3499) Add flag to Opeators to write the Render to the log.

2018-12-11 Thread jack (JIRA)
jack created AIRFLOW-3499:
-

 Summary: Add flag to Opeators to write the Render to the log.
 Key: AIRFLOW-3499
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3499
 Project: Apache Airflow
  Issue Type: Wish
Affects Versions: 2.0.0
Reporter: jack


*Motivation:*

I have few operators who uses Variable. The variable is updated consistently 
and gets overwritten with more recent data.

Say I have this operator:

 
{code:java}
NEXT_ORDER_ID= Variable.get("next_order_id_to_import")
import_orders = MySqlToGoogleCloudStorageOperator(
    task_id='import',
    mysql_conn_id='c_production',
    google_cloud_storage_conn_id='gcp_m',
    approx_max_file_size_bytes = 1, 
    sql = 'Select … from … where orders_id between {{ params.next_order_id}} 
and {{ ti.xcom_pull('max_order_id') }}',
    params={'next_order_id_to_import': NEXT_ORDER_ID},
    bucket=GCS_BUCKET_ID,
    filename=file_name_orders_products,
    dag=dag)
{code}
 

 

 

The problem is that If I can not see the parameters for this query on the 
render page.

To be exact : 
{code:java}
 {{ ti.xcom_pull('max_order_id') }}{code}
- Saved in Database so it always show the correct one per DAG.

 
{code:java}
{{ params.next_order_id}} {code}
- Will always show the most recent value as this is not DAG parameter and not 
saved to the database. When clicking on the render page it goes to the variable 
and take the value from there, regardless if this was the value during the run 
or not.

 

 

*Suggested Solution:*

Since it's unlikely that the Render tab will be change (as my use case could be 
different than how other use it) the best solution is simply to allow to write 
the Render as it was during the execution of the task to the task log. This 
will help to traceback issues.

 

Basically add to all operators (Base Operator?) a flag :
{code:java}
write_render_to_log {code}
which default is False. If this flag set to true than the render content will 
be flushed to the log of the task.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1158) Multipart uploads to s3 cut off at nearest division

2018-12-11 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716518#comment-16716518
 ] 

jack commented on AIRFLOW-1158:
---

According to the pull request this is no longer needed because Airflow uses 
boto3 upload_file now instead of FileChunkIO.

 

This can be closed.

> Multipart uploads to s3 cut off at nearest division
> ---
>
> Key: AIRFLOW-1158
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1158
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws, hooks
>Reporter: Maksim Pecherskiy
>Assignee: Maksim Pecherskiy
>Priority: Minor
>
> When I try to upload a file of say 104MBs, using multipart uploads of 10MB 
> chunks, I get 10 chunks of 10MBs and that's it.  The 4MBs left over do not 
> get uploaded.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2265) Splunk Hook/Operator

2018-12-11 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716497#comment-16716497
 ] 

jack commented on AIRFLOW-2265:
---

Hi [~yampelo] you can use the mailing list for some input. Also note that 
you'll need to write tests for this 

> Splunk Hook/Operator
> 
>
> Key: AIRFLOW-2265
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2265
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks, operators
>Reporter: Omer Yampel
>Priority: Minor
>
> I'm pretty new to airflow, but I'd like to develop a hook/operator for 
> Splunk. Ideally this would give me the ability to execute a search as a task 
> in a DAG, then use the data fetched from the results in the next tasks.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-532) DB API hook's insert rows sets autocommit non-generically

2018-12-11 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716489#comment-16716489
 ] 

jack commented on AIRFLOW-532:
--

[~ashb] can you close this?

> DB API hook's insert rows sets autocommit non-generically
> -
>
> Key: AIRFLOW-532
> URL: https://issues.apache.org/jira/browse/AIRFLOW-532
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Reporter: Luke Rohde
>Assignee: Luke Rohde
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3347) Unable to configure Kubernetes secrets through environment

2018-12-10 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715012#comment-16715012
 ] 

jack commented on AIRFLOW-3347:
---

[~cbandy] can you open a PR with the fix?

> Unable to configure Kubernetes secrets through environment
> --
>
> Key: AIRFLOW-3347
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3347
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: configuration, kubernetes
>Affects Versions: 1.10.0
>Reporter: Chris Bandy
>Priority: Major
>
> We configure Airflow through environment variables. While setting up the 
> Kubernetes Executor, we wanted to pass the SQL Alchemy connection string to 
> workers by including it the {{kubernetes_secrets}} section of config.
> Unfortunately, even with 
> {{AIRFLOW_\_KUBERNETES_SECRETS_\_AIRFLOW_\_CORE_\_SQL_ALCHEMY_CONN}} set in 
> the scheduler environment, the worker gets no environment secret environment 
> variables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-54) Special tags when referring to other tasks

2018-12-10 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714807#comment-16714807
 ] 

jack commented on AIRFLOW-54:
-

This is a great idea

+1 for it as well

> Special tags when referring to other tasks
> --
>
> Key: AIRFLOW-54
> URL: https://issues.apache.org/jira/browse/AIRFLOW-54
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: hooks, operators
>Reporter: Adam Mustafa
>Assignee: Adam Mustafa
>Priority: Minor
>
> At the moment when you need to refer to other tasks, you either have to refer 
> to them by name or search though a list of upstream objects. 
> For example, in order to get the task_ids needed by xcom_pull for the 
> upstream items, you need to do:
> 
>  ti.xcom_pull(task_ids=[task.task_id for task in self.upstream_list])[0]
> This issue suggests adding a keyword operator similar in form to those the 
> schedule would use. 
> Possible items might include:
> @upstream: Tasks directly upstream <--- *change this to @parent*
> @ancestor: Any task that is in the ancestor tree of this dag
> @sibling: other tasks stemming from the same upstream tasks
> @children: tasks from the children of this node
> Aside from providing more simple arguments, this also improves the 
> readability of the functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-123) Differentiate Between Failure and Timeout

2018-12-10 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714804#comment-16714804
 ] 

jack commented on AIRFLOW-123:
--

There can be various of reasons for DAG failure. Why "timeout" or more 
specifically max tries reached is so important that it needs it's own status? 

> Differentiate Between Failure and Timeout
> -
>
> Key: AIRFLOW-123
> URL: https://issues.apache.org/jira/browse/AIRFLOW-123
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: db, operators, ui
>Reporter: Kevin Mandich
>Assignee: Norman Mu
>Priority: Minor
>
> Currently, timeouts (when a task's runtime reaches the task-defined timeout) 
> are classified as a failure. It would be useful to differentiate states 
> between a failure due to a timeout and, as an example, a failure due to an 
> exception being raised in a PythonOperator.
> At a minimum, it would be useful to see a task which failed due to timeout 
> visualized as a different color in the Tree View. It would also be nice to 
> utilize this classification as part of DAG logic. Examples here include an 
> option to retry on timeouts specifically, and the ability to take a different 
> action downstream if a particular task failed due to timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1396) Transfer data from BigQuery to MySQL operator

2018-12-10 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714783#comment-16714783
 ] 

jack commented on AIRFLOW-1396:
---

There isn't any direct operator from any database to BigQuery.

Operators are for GCS and from there you can use the 
GoogleCloudStorageToBigQueryOperator

 

This also the same for the other way around. Perform 
BigQueryOperatorToGoogleCloudStorage and from there you can load the CSV/Json 
to a database of your choice. __ 

 

I think this one is a won't fix.

> Transfer data from BigQuery to MySQL operator
> -
>
> Key: AIRFLOW-1396
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1396
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, operators
>Reporter: Yu Ishikawa
>Priority: Major
>
> We should have an operator to transfer queried data from bigquery to mysql 
> like {{HiveToMySqlTransfer}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1417) Add tests for BigQueryToCloudStorageOperator

2018-12-10 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714781#comment-16714781
 ] 

jack commented on AIRFLOW-1417:
---

I'm surprised we don't have that yet. 

> Add tests for BigQueryToCloudStorageOperator
> 
>
> Key: AIRFLOW-1417
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1417
> Project: Apache Airflow
>  Issue Type: Test
>  Components: contrib, operators, tests
>Reporter: Yu Ishikawa
>Priority: Minor
>
> h2.Goals
> - Add unit tests for {{BigQueryToCloudStorageOperator}}
> - Especially, test for macro in {{destination_cloud_storage_uris}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1026) connection string using _cmd tin airflow.cfg is broken

2018-12-10 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714687#comment-16714687
 ] 

jack commented on AIRFLOW-1026:
---

SQLite can not be used with LocalExecutor

> connection string using _cmd tin airflow.cfg is broken
> --
>
> Key: AIRFLOW-1026
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1026
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: configuration
>Affects Versions: 1.8.0
>Reporter: Harish Singh
>Priority: Critical
> Fix For: 1.8.0
>
>
> sql_alchemy_conn_cmd = python ./pipeline/dags/configure.py
> I am expectiing configure.py to be invoked.
> But it just throws:
>  "cannot use sqlite with the LocalExecutor"
> The connection string that my script "configure.py" would return is something 
> like this:
> mysql+mysqldb://username:**@mysqlhostname:3306/airflowdbname
> But after debugging, I found that, my script is not getting invoked at all.
> This is my airflow.cfg:
> executor = LocalExecutor
> sql_alchemy_conn_cmd = python ./pipeline/dags/configure.py 
> sql_alchemy_pool_size = 5
> sql_alchemy_pool_recycle = 3600
> I tried not using the script and directly hardcoding the conn_url
> sql_alchemy_conn = 
> mysql+mysqldb://username:**@mysqlhostname:3306/airflowdbname
> It works.
> But  there is a regression bug if somebody wants to use "sql_alchemy_conn_cmd"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1822) Add gaiohttp and gthread gunicorn workerclass option in cli

2018-12-08 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713881#comment-16713881
 ] 

jack commented on AIRFLOW-1822:
---

This ticket has also a pending PR:

[https://github.com/apache/incubator-airflow/pull/2794]

if closing ticket needs also close the PR.

[~ashb]

> Add gaiohttp and gthread gunicorn workerclass option in cli
> ---
>
> Key: AIRFLOW-1822
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1822
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Reporter: Sanjay Pillai
>Assignee: Iuliia Volkova
>Priority: Minor
>
> gunicorn in min version has been updated to 19.40 
> we need to add cli support for gthread and gaiohttp worker class



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1490) In order to get details on exceptions thrown by tasks, the onfailure callback needs an enhancement

2018-12-06 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711477#comment-16711477
 ] 

jack commented on AIRFLOW-1490:
---

You can prepare PR with your code suggestions

> In order to get details on exceptions thrown by tasks, the onfailure callback 
> needs an enhancement
> --
>
> Key: AIRFLOW-1490
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1490
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.8.0, 2.0.0
>Reporter: Steen Manniche
>Priority: Major
> Attachments: 
> 0001-AIRFLOW-1490-carry-exceptions-through-to-the-on_fail.patch
>
>
> The code called when an exception is thrown by a task receives information on 
> the exception thrown from the task, but fails to delegate this information to 
> the registered callbacks. 
> https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L1524
>  sends the context to the registered failure callback, but this context does 
> not include the thrown exception.
> The supplied patch proposes a non-api-breaking way of including the exception 
> in the context in order to provide clients with the full exception type and 
> -traceback



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1405) Airflow v 1.8.1 unable to properly initialize with MySQL

2018-12-06 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711385#comment-16711385
 ] 

jack commented on AIRFLOW-1405:
---

I think the best solution for this is for Airflow to catch this exception and 
replace it with message announcing the user that he must use a newer version of 
MySQL.

> Airflow v 1.8.1 unable to properly initialize with MySQL
> 
>
> Key: AIRFLOW-1405
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1405
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Affects Versions: 1.8.1
> Environment: CentOS7
>Reporter: Aakash Bhardwaj
>Priority: Major
> Fix For: 1.8.1
>
> Attachments: error_log.txt
>
>
> While working on a CentOS7 system, I was trying to configure Airflow version 
> 1.8.1 to run with MySql in the backend.
> I have installed Airflow in a Virtual Environment, and the MySQL has a 
> database named airflow (default).
> But on running the command -
> {code:none}
> airflow initdb
> {code}
> the following error is reported
> {noformat}
> [2017-07-12 13:22:36,558] {__init__.py:57} INFO - Using executor LocalExecutor
> DB: mysql://airflow:***@localhost/airflow
> [2017-07-12 13:22:37,218] {db.py:287} INFO - Creating tables
> INFO  [alembic.runtime.migration] Context impl MySQLImpl.
> INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
> INFO  [alembic.runtime.migration] Running upgrade f2ca10b85618 -> 
> 4addfa1236f1, Add fractional seconds to mysql tables
> Traceback (most recent call last):
>   File "/opt/airflow_virtual_environment/airflow_venv/bin/airflow", line 28, 
> in 
> args.func(args)
>   File 
> "/opt/airflow_virtual_environment/airflow_venv/lib/python2.7/site-packages/airflow/bin/cli.py",
>  line 951, in initdb
> db_utils.initdb()
>   File 
> "/opt/airflow_virtual_environment/airflow_venv/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 106, in initdb
> upgradedb()
>   File 
> "/opt/airflow_virtual_environment/airflow_venv/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 294, in upgradedb
> command.upgrade(config, 'heads')
>   File 
> "/opt/airflow_virtual_environment/airflow_venv/lib/python2.7/site-packages/alembic/command.py",
>  line 174, in upgrade
> script.run_env()
>   File 
> "/opt/airflow_virtual_environment/airflow_venv/lib/python2.7/site-packages/alembic/script/base.py",
>  line 416, in run_env
> util.load_python_file(self.dir, 'env.py')
>   File 
> "/opt/airflow_virtual_environment/airflow_venv/lib/python2.7/site-packages/alembic/util/pyfiles.py",
>  line 93, in load_python_file
> module = load_module_py(module_id, path)
>   File 
> "/opt/airflow_virtual_environment/airflow_venv/lib/python2.7/site-packages/alembic/util/compat.py",
>  line 79, in load_module_py
> mod = imp.load_source(module_id, path, fp)
>   File 
> "/opt/airflow_virtual_environment/airflow_venv/lib/python2.7/site-packages/airflow/migrations/env.py",
>  line 86, in 
> run_migrations_online()
>   File 
> "/opt/airflow_virtual_environment/airflow_venv/lib/python2.7/site-packages/airflow/migrations/env.py",
>  line 81, in run_migrations_online
> context.run_migrations()
>   File "", line 8, in run_migrations
>   File 
> "/opt/airflow_virtual_environment/airflow_venv/lib/python2.7/site-packages/alembic/runtime/environment.py",
>  line 807, in run_migrations
> self.get_context().run_migrations(**kw)
>   File 
> "/opt/airflow_virtual_environment/airflow_venv/lib/python2.7/site-packages/alembic/runtime/migration.py",
>  line 321, in run_migrations
> step.migration_fn(**kw)
>   File 
> "/opt/airflow_virtual_environment/airflow_venv/lib/python2.7/site-packages/airflow/migrations/versions/4addfa1236f1_add_fractional_seconds_to_mysql_tables.py",
>  line 36, in upgrade
> op.alter_column(table_name='dag', column_name='last_scheduler_run', 
> type_=mysql.DATETIME(fsp=6))
>   File "", line 8, in alter_column
>   File "", line 3, in alter_column
>   File 
> "/opt/airflow_virtual_environment/airflow_venv/lib/python2.7/site-packages/alembic/operations/ops.py",
>  line 1420, in alter_column
> return operations.invoke(alt)
>   File 
> "/opt/airflow_virtual_environment/airflow_venv/lib/python2.7/site-packages/alembic/operations/base.py",
>  line 318, in invoke
> return fn(self, operation)
>   File 
> "/opt/airflow_virtual_environment/airflow_venv/lib/python2.7/site-packages/alembic/operations/toimpl.py",
>  line 53, in alter_column
> **operation.kw
>   File 
> "/opt/airflow_virtual_environment/airflow_venv/lib/python2.7/site-packages/alembic/ddl/mysql.py",
>  line 67, in alter_column
> else existing_autoincrement
>   File 
> "/opt/airflow_virtual_environment/airflow_venv/lib/python2.7/site-packages/alembic/ddl/impl.py",
>  line 

[jira] [Work stopped] (AIRFLOW-3071) Unable to clear Val of Variable from the UI

2018-12-05 Thread jack (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-3071 stopped by jack.
-
> Unable to clear Val of Variable from the UI
> ---
>
> Key: AIRFLOW-3071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3071
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: jack
>Priority: Minor
>  Labels: easyfix
>
> This is quite annoying bug.
>  
> Reproduce steps:
>  # Create a Variable.
>  # Give the Variable a Val & save it.
>  # Click edit Variable. You will see the Key with Red {color:#FF}*{color} 
> and the value that you entered.
>  # Remove the Val (leave the field blank) and click save.
>  # No errors will appear. However if you will re-enter to the Variable you 
> will see that the Blank value was not saved.
>  
> Please allow to remove Val. This is also the intend behavior because the Val 
> has no {color:#FF}*{color} near it.
> The current work around is to delete the Variable and re-create it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-3071) Unable to clear Val of Variable from the UI

2018-12-05 Thread jack (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jack reassigned AIRFLOW-3071:
-

Assignee: (was: jack)

> Unable to clear Val of Variable from the UI
> ---
>
> Key: AIRFLOW-3071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3071
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: jack
>Priority: Minor
>  Labels: easyfix
>
> This is quite annoying bug.
>  
> Reproduce steps:
>  # Create a Variable.
>  # Give the Variable a Val & save it.
>  # Click edit Variable. You will see the Key with Red {color:#FF}*{color} 
> and the value that you entered.
>  # Remove the Val (leave the field blank) and click save.
>  # No errors will appear. However if you will re-enter to the Variable you 
> will see that the Blank value was not saved.
>  
> Please allow to remove Val. This is also the intend behavior because the Val 
> has no {color:#FF}*{color} near it.
> The current work around is to delete the Variable and re-create it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3454) Task duration isn't updated while task is running

2018-12-05 Thread jack (JIRA)
jack created AIRFLOW-3454:
-

 Summary: Task duration isn't updated while task is running
 Key: AIRFLOW-3454
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3454
 Project: Apache Airflow
  Issue Type: Bug
Affects Versions: 1.9.0
Reporter: jack
 Attachments: task.png

When task is running you can hover it on the UI and get some details.

The Duration value isn't updated. 

It shows the same value whether it ran 2 min or 30 min.

Refresh does not help.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3058) Airflow log & multi-threading

2018-12-05 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710075#comment-16710075
 ] 

jack commented on AIRFLOW-3058:
---

Yep. Added -u flag to the run of the script solve this issue.

I think it's worth mentioning somewhere in the docs though... the problem shown 
here isn't the delay of the print but the fact that the time stamp is incorrect.

The timestamp is of the actual print to the log and not of the time it was 
entered to the buffer. This is highly confusing. 

> Airflow log & multi-threading
> -
>
> Key: AIRFLOW-3058
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3058
> Project: Apache Airflow
>  Issue Type: Task
>Reporter: jack
>Priority: Major
> Attachments: 456.PNG, Sni.PNG
>
>
> The airflow log does not show messages in real time when executing scripts 
> with Multi-threading.
>  
> for example:
>  
> The left is the Airflow log time. the right is the actual time of the print 
> in my code. If I would execute the script without airflow the console will 
> show the times on the right.
> !Sni.PNG!
> {code:java}
> 2018-09-13 14:19:17,325] {base_task_runner.py:98} INFO - Subtask: [2018-09-13 
> 14:19:17,289] {bash_operator.py:101} INFO - 2018-09-13 14:14:55.230044 
> Thread: Thread-1 Generate page: #0 run #0 with URL: 
> http://...=2=0=1000
> [2018-09-13 14:19:17,325] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,289] {bash_operator.py:101} INFO - 2018-09-13 
> 14:14:55.231635 Thread: Thread-2 Generate page: #1 run #0 with URL: 
> http://...=2=1000=1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,289] {bash_operator.py:101} INFO - 2018-09-13 
> 14:14:55.233226 Thread: Thread-3 Generate page: #2 run #0 with URL: 
> http://...=2=2000=1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,289] {bash_operator.py:101} INFO - 2018-09-13 
> 14:14:55.234020 Thread: Thread-4 Generate page: #3 run #0 with URL: 
> http://...=2=3000=1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:43.100122 Thread: Thread-1 page 0 finished. Length is 1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:43.100877 Thread: Thread-1 Generate page: #4 run #0 with URL: 
> http://...=2=4000=1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:46.254536 Thread: Thread-3 page 2 finished. Length is 1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:46.255508 Thread: Thread-3 Generate page: #5 run #0 with URL: 
> http://...=2=5000=1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:51.096360 Thread: Thread-2 page 1 finished. Length is 1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:51.097269 Thread: Thread-2 Generate page: #6 run #0 with URL: 
> http://...=2=6000=1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:53.112621 Thread: Thread-4 page 3 finished. Length is 1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:53.113455 Thread: Thread-4 Generate page: #7 run #0 with URL: 
> http://...=2=7000=1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:16:37.345343 Thread: Thread-3 Generate page: #8 run #0 with URL: 
> http://...=2=8000=1000
> [2018-09-13 14:19:17,328] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:16:37.701201 Thread: Thread-2 Generate page: #9 run #0 with URL: 
> http://...=2=9000=1000
> [2018-09-13 14:19:17,328] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,291] {bash_operator.py:101} INFO - 2018-09-13 
> 14:16:47.283796 Thread: Thread-1 page 4 finished. Length is 1000
> [2018-09-13 14:19:17,328] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,291] {bash_operator.py:101} INFO - 2018-09-13 
> 14:17:27.169359 Thread: Thread-2 page 9 finished. Length is 1000
>  
> {code}
> This never happens when executing regular code.. Happens only with 
> multi-threading. I have some other scripts that the 

[jira] [Closed] (AIRFLOW-3058) Airflow log & multi-threading

2018-12-05 Thread jack (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jack closed AIRFLOW-3058.
-
Resolution: Invalid

> Airflow log & multi-threading
> -
>
> Key: AIRFLOW-3058
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3058
> Project: Apache Airflow
>  Issue Type: Task
>Reporter: jack
>Priority: Major
> Attachments: 456.PNG, Sni.PNG
>
>
> The airflow log does not show messages in real time when executing scripts 
> with Multi-threading.
>  
> for example:
>  
> The left is the Airflow log time. the right is the actual time of the print 
> in my code. If I would execute the script without airflow the console will 
> show the times on the right.
> !Sni.PNG!
> {code:java}
> 2018-09-13 14:19:17,325] {base_task_runner.py:98} INFO - Subtask: [2018-09-13 
> 14:19:17,289] {bash_operator.py:101} INFO - 2018-09-13 14:14:55.230044 
> Thread: Thread-1 Generate page: #0 run #0 with URL: 
> http://...=2=0=1000
> [2018-09-13 14:19:17,325] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,289] {bash_operator.py:101} INFO - 2018-09-13 
> 14:14:55.231635 Thread: Thread-2 Generate page: #1 run #0 with URL: 
> http://...=2=1000=1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,289] {bash_operator.py:101} INFO - 2018-09-13 
> 14:14:55.233226 Thread: Thread-3 Generate page: #2 run #0 with URL: 
> http://...=2=2000=1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,289] {bash_operator.py:101} INFO - 2018-09-13 
> 14:14:55.234020 Thread: Thread-4 Generate page: #3 run #0 with URL: 
> http://...=2=3000=1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:43.100122 Thread: Thread-1 page 0 finished. Length is 1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:43.100877 Thread: Thread-1 Generate page: #4 run #0 with URL: 
> http://...=2=4000=1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:46.254536 Thread: Thread-3 page 2 finished. Length is 1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:46.255508 Thread: Thread-3 Generate page: #5 run #0 with URL: 
> http://...=2=5000=1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:51.096360 Thread: Thread-2 page 1 finished. Length is 1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:51.097269 Thread: Thread-2 Generate page: #6 run #0 with URL: 
> http://...=2=6000=1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:53.112621 Thread: Thread-4 page 3 finished. Length is 1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:53.113455 Thread: Thread-4 Generate page: #7 run #0 with URL: 
> http://...=2=7000=1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:16:37.345343 Thread: Thread-3 Generate page: #8 run #0 with URL: 
> http://...=2=8000=1000
> [2018-09-13 14:19:17,328] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:16:37.701201 Thread: Thread-2 Generate page: #9 run #0 with URL: 
> http://...=2=9000=1000
> [2018-09-13 14:19:17,328] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,291] {bash_operator.py:101} INFO - 2018-09-13 
> 14:16:47.283796 Thread: Thread-1 page 4 finished. Length is 1000
> [2018-09-13 14:19:17,328] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,291] {bash_operator.py:101} INFO - 2018-09-13 
> 14:17:27.169359 Thread: Thread-2 page 9 finished. Length is 1000
>  
> {code}
> This never happens when executing regular code.. Happens only with 
> multi-threading. I have some other scripts that the airflow print appears 
> after more than 30 minutes.
>  
>  Check this one:
> hours of delay and then printing everything together. These are not real 
> time. the prints in the log has no correlation to the actual time the command 
> was executed.
>  
> !456.PNG!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1370) Scheduler is crashing because of IntegrityError

2018-12-02 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16706336#comment-16706336
 ] 

jack commented on AIRFLOW-1370:
---

Possibly duplicate of: https://issues.apache.org/jira/browse/AIRFLOW-2219

> Scheduler is crashing because of IntegrityError
> ---
>
> Key: AIRFLOW-1370
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1370
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery, scheduler
>Affects Versions: 1.8.0
>Reporter: Maneesh Sharma
>Priority: Major
>
> Scheduler is crashing with multiple task running on Celery Executor. It is 
> throwing `{color:red}IntegrityError: (psycopg2.IntegrityError) duplicate key 
> value violates unique constraint "task_instance_pkey"{color}`. Below is the 
> complete stack trace of error --
> Process DagFileProcessor490-Process:
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in 
> _bootstrap
> self.run()
>   File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
> self._target(*self._args, **self._kwargs)
>   File "/home/ubuntu/.local/lib/python2.7/site-packages/airflow/jobs.py", 
> line 348, in helper
> pickle_dags)
>   File "/home/ubuntu/.local/lib/python2.7/site-packages/airflow/utils/db.py", 
> line 53, in wrapper
> result = func(*args, **kwargs)
>   File "/home/ubuntu/.local/lib/python2.7/site-packages/airflow/jobs.py", 
> line 1587, in process_file
> self._process_dags(dagbag, dags, ti_keys_to_schedule)
>   File "/home/ubuntu/.local/lib/python2.7/site-packages/airflow/jobs.py", 
> line 1176, in _process_dags
> self._process_task_instances(dag, tis_out)
>   File "/home/ubuntu/.local/lib/python2.7/site-packages/airflow/jobs.py", 
> line 880, in _process_task_instances
> run.verify_integrity(session=session)
>   File "/home/ubuntu/.local/lib/python2.7/site-packages/airflow/utils/db.py", 
> line 53, in wrapper
> result = func(*args, **kwargs)
>   File "/home/ubuntu/.local/lib/python2.7/site-packages/airflow/models.py", 
> line 4117, in verify_integrity
> session.commit()
>   File 
> "/home/ubuntu/.local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", 
> line 906, in commit
> self.transaction.commit()
>   File 
> "/home/ubuntu/.local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", 
> line 461, in commit
> self._prepare_impl()
>   File 
> "/home/ubuntu/.local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", 
> line 441, in _prepare_impl
> self.session.flush()
>   File 
> "/home/ubuntu/.local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", 
> line 2171, in flush
> self._flush(objects)
>   File 
> "/home/ubuntu/.local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", 
> line 2291, in _flush
> transaction.rollback(_capture_exception=True)
>   File 
> "/home/ubuntu/.local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py",
>  line 66, in __exit__
> compat.reraise(exc_type, exc_value, exc_tb)
>   File 
> "/home/ubuntu/.local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", 
> line 2255, in _flush
> flush_context.execute()
>   File 
> "/home/ubuntu/.local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py",
>  line 389, in execute
> rec.execute(self)
>   File 
> "/home/ubuntu/.local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py",
>  line 548, in execute
> uow
>   File 
> "/home/ubuntu/.local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py",
>  line 181, in save_obj
> mapper, table, insert)
>   File 
> "/home/ubuntu/.local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py",
>  line 799, in _emit_insert_statements
> execute(statement, multiparams)
>   File 
> "/home/ubuntu/.local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", 
> line 945, in execute
> return meth(self, multiparams, params)
>   File 
> "/home/ubuntu/.local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", 
> line 263, in _execute_on_connection
> return connection._execute_clauseelement(self, multiparams, params)
>   File 
> "/home/ubuntu/.local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", 
> line 1053, in _execute_clauseelement
> compiled_sql, distilled_params
>   File 
> "/home/ubuntu/.local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", 
> line 1189, in _execute_context
> context)
>   File 
> "/home/ubuntu/.local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", 
> line 1402, in _handle_dbapi_exception
> exc_info
>   File 
> "/home/ubuntu/.local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", 
> line 203, in raise_from_cause
> reraise(type(exception), exception, tb=exc_tb, cause=cause)
>   File 
> 

[jira] [Commented] (AIRFLOW-2219) Race condition to DagRun.verify_integrity between Scheduler and Webserver

2018-12-02 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16706338#comment-16706338
 ] 

jack commented on AIRFLOW-2219:
---

Possibly duplicate of 
https://issues.apache.org/jira/projects/AIRFLOW/issues/AIRFLOW-1370

> Race condition to DagRun.verify_integrity between Scheduler and Webserver
> -
>
> Key: AIRFLOW-2219
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2219
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun, db, scheduler, webserver
>Affects Versions: 1.8.1, 1.9.0
>Reporter: Will Wong
>Priority: Trivial
>
> Symptoms:
>  * Triggering dag causes the 404 nuke page with an error message along the 
> lines of: {{psycopg2.IntegrityError: duplicate key value violates unique 
> constraint "task_instance_pkey"}} when calling {{DagRun.verify_integrity}}
> Or
>  * Similar error in scheduler log for dag file when scheduling a DAG. 
> (Example exception at the end of description)
> This occurs because {{Dag.create_dagrun}} commits a the dag_run entry to the 
> database and then runs {{verify_integrity}} to add the task_instances 
> immediately. However, the scheduler already picks up a dag run before all 
> task_instances are created and also calls {{verify_integrity}} to create 
> task_instances at the same time.
> I don't _think_ this actually breaks anything in particular. The exception 
> happens either on the webpage or in the scheduler logs:
>  * If it occurs in the UI, it just scares people thinking something broke but 
> the task_instances will be created by the scheduler.
>  * If the error shows up in the scheduler, the task_instances are created by 
> the webserver and it continues processing the DAG during the next loop.
>  
>  I'm not sure if {{DagRun.verify_integrity}} is necessary for both 
> {{SchedulerJob._process_task_instances}} as well {{Dag.create_dagrun}} but 
> perhaps we can just stick to one?
>  
> {noformat}
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
> line 1170, in _execute_context
>     context)
>   File 
> "/usr/local/lib/python3.6/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py",
>  line 683, in do_executemany
>     cursor.executemany(statement, parameters)
> psycopg2.IntegrityError: duplicate key value violates unique constraint 
> "task_instance_pkey"
> DETAIL:  Key (task_id, dag_id, execution_date)=(docker_task_10240_7680_0, 
> chunkedgraph_edgetask_scheduler, 2018-03-15 23:46:57.116673) already exists.
> The above exception was the direct cause of the following exception:
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 371, in 
> helper
>     pickle_dags)
>   File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 50, 
> in wrapper
>     result = func(*args, **kwargs)
>   File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 1792, 
> in process_file
>     self._process_dags(dagbag, dags, ti_keys_to_schedule)
>   File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 1391, 
> in _process_dags
>     self._process_task_instances(dag, tis_out)
>   File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 915, in 
> _process_task_instances
>     run.verify_integrity(session=session)
>   File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 50, 
> in wrapper
>     result = func(*args, **kwargs)
>   File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 4786, 
> in verify_integrity
>     session.commit()
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/session.py", 
> line 943, in commit
>     self.transaction.commit()
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/session.py", 
> line 467, in commit
>     self._prepare_impl()
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/session.py", 
> line 447, in _prepare_impl
>     self.session.flush()
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/session.py", 
> line 2254, in flush
>     self._flush(objects)
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/session.py", 
> line 2380, in _flush
>     transaction.rollback(_capture_exception=True)
>   File 
> "/usr/local/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 
> 66, in __exit__
>     compat.reraise(exc_type, exc_value, exc_tb)
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", 
> line 187, in reraise
>     raise value
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/session.py", 
> line 2344, in _flush
>     flush_context.execute()
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/unitofwork.py", 
> line 391, in execute
>     

[jira] [Commented] (AIRFLOW-3364) Change Ooops Exception to something meanningfull

2018-11-27 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700408#comment-16700408
 ] 

jack commented on AIRFLOW-3364:
---

This could be related to: 
[https://issues.apache.org/jira/projects/AIRFLOW/issues/AIRFLOW-419]

I'm also running PostgreSQL and the issue occurs when trying to change task 
status.

could be more than a mere coincidence.

> Change Ooops Exception to something meanningfull 
> -
>
> Key: AIRFLOW-3364
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3364
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: jack
>Priority: Major
> Attachments: oops.PNG
>
>
> When I try to manually change tasks state (from up_for_retry) to success or 
> whatever.
> I often get:
> h1.  
> {code:java}
> Ooops.  Traceback (most recent call last): File 
> "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1988, in wsgi_app 
> response = self.full_dispatch_request() File 
> "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1641, in 
> full_dispatch_request rv = self.handle_user_exception(e) File 
> "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1544, in 
> handle_user_exception reraise(exc_type, exc_value, tb) File 
> "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1639, in 
> full_dispatch_request rv = self.dispatch_request() File 
> "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1625, in 
> dispatch_request return self.view_functions[rule.endpoint](**req.view_args) 
> File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 69, 
> in inner return self._run_view(f, *args, **kwargs) File 
> "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 368, in 
> _run_view return fn(self, *args, **kwargs) File 
> "/usr/local/lib/python2.7/dist-packages/flask_admin/model/base.py", line 
> 2068, in action_view return self.handle_action() File 
> "/usr/local/lib/python2.7/dist-packages/flask_admin/actions.py", line 113, in 
> handle_action response = handler[0](ids) File 
> "/usr/local/lib/python2.7/dist-packages/airflow/www/views.py", line 2502, in 
> action_set_success self.set_task_instance_state(ids, State.SUCCESS) File 
> "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 50, in 
> wrapper result = func(*args, **kwargs) File 
> "/usr/local/lib/python2.7/dist-packages/airflow/www/views.py", line 2540, in 
> set_task_instance_state raise Exception("Ooops") Exception: Ooops{code}
>  
>  
> It's unclear what this oops is and why it's needed.
> If there is exception it should shown the actual error.
>  
> Usually I simply delete the task to solve this. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1703) Airflow LocalExecutor crashes after 3 hours of work. Database is locked

2018-11-27 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700393#comment-16700393
 ] 

jack commented on AIRFLOW-1703:
---

SQLite with local executor? This doesn't make a lot of sense

> Airflow LocalExecutor crashes after 3 hours of work. Database is locked
> ---
>
> Key: AIRFLOW-1703
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1703
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db, worker
>Affects Versions: 1.8.0
> Environment: Single CentOS virtual server
>Reporter: Kirill Dubovikov
>Priority: Major
> Attachments: nohup.out
>
>
> Airflow consistently crashes after working several hours on a single node 
> when using SQLite DB. Our DAG is scheduled to run {{@daily}}. We launch 
> airflow using the following commands
> {code:sh}
> airflow scheduler
> airflow webserver -p 8080
> {code}
> After a while worker and webserver crash with the following error: 
> {{sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is 
> locked [SQL: 'SELECT connection.conn_id AS connection_conn_id \nFROM 
> connection GROUP BY connection.conn_id']}}
> I've attached full logs for further investigation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2076) [2017-11-21 16:12:40,961] {jobs.py:187} DEBUG - [heart] Boom.

2018-11-27 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700382#comment-16700382
 ] 

jack commented on AIRFLOW-2076:
---

you are using old version of Airflow. can you check it against master?

> [2017-11-21 16:12:40,961] {jobs.py:187} DEBUG - [heart] Boom.
> -
>
> Key: AIRFLOW-2076
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2076
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api
>Affects Versions: 1.8.0, 1.8.2, 2.0.0
>Reporter: fann
>Priority: Critical
>  Labels: features
>
> I used airflow submit spark application, the airflow DAG always print DEBUG - 
> [heart] Boom. When my spark application is finished, the print is still 
> on,and the DAG cannot finished!
> I was used the LocalExecutor. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1739) Cleanup naming ambiguity with TestDbApiHook test class

2018-11-26 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699973#comment-16699973
 ] 

jack commented on AIRFLOW-1739:
---

PR was merged. Ticket can be closed.

> Cleanup naming ambiguity with TestDbApiHook test class
> --
>
> Key: AIRFLOW-1739
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1739
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Andy Hadjigeorgiou
>Assignee: Andy Hadjigeorgiou
>Priority: Trivial
>
> The TestDbApiHook class creates a class whose name is TestDBApiHook - I'm 
> proposing a simple naming change to make sure the two are distinguishable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3399) BashOperator - bash_command is not rechecked after retry

2018-11-26 Thread jack (JIRA)
jack created AIRFLOW-3399:
-

 Summary: BashOperator - bash_command is not rechecked after retry
 Key: AIRFLOW-3399
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3399
 Project: Apache Airflow
  Issue Type: Bug
Affects Versions: 1.9.0
Reporter: jack


The operator does not check if bash_command parameter was changed between 
retries.

 

Steps to reproduce:

Create operator:
{code:java}
test = BashOperator(task_id='my_id',
 bash_command="python3.6 testing.py '{{ params.start_date }}' ",
 dag=dag){code}
 

 

Say that testing.py fails (wrong py file given) -  task is set on up_for_retry

change it to the right one:

 
{code:java}
test = BashOperator(task_id='my_id',
 bash_command="python3.6 testing_a.py '{{ params.start_date }}' ",
 dag=dag){code}
 

clear the task.

Airflow will execute testing.py again and not testing_a.py

 

Only if clearing the up stream task of the BashOperator then it recheck the 
bash command python file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2813) `pip install apache-airflow` fails on Python 3.7

2018-11-26 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698632#comment-16698632
 ] 

jack commented on AIRFLOW-2813:
---

[~ashb] this was resolved in 1.10.1 wasn't it?

> `pip install apache-airflow` fails on Python 3.7
> 
>
> Key: AIRFLOW-2813
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2813
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0, 1.10.0
> Environment: Mac OS, Linux, Windows
>Reporter: Jeff Schwab
>Assignee: Ash Berlin-Taylor
>Priority: Major
> Fix For: 1.10.1
>
>
> `pip install apache-airflow` fails with a SyntaxError on Mac OS, and with a 
> different (extremely verbose) error on Linux.  This happens both on my 
> MacBook and on a fresh Alpine Linux Docker image, and with both pip2 and 
> pip3; a friend just tried `pip install apache-airflow` for me on his Windows 
> box, and it died with yet another error.  Googling quickly found someone else 
> seeing the same issue over a week ago: 
> https://gitter.im/apache/incubator-airflow?at=5b5130bac86c4f0b47201af0
> Please let me know what further information you would like, and/or what I am 
> doing wrong.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3302) Small CSS fixes

2018-11-26 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698627#comment-16698627
 ] 

jack commented on AIRFLOW-3302:
---

PR was merged. The ticket can be closed.

> Small CSS fixes
> ---
>
> Key: AIRFLOW-3302
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3302
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Sumit Maheshwari
>Assignee: Sumit Maheshwari
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3301) Update CI test for [AIRFLOW-3132] (PR #3977)

2018-11-26 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698626#comment-16698626
 ] 

jack commented on AIRFLOW-3301:
---

PR was merged. This can be closed.

> Update CI test for [AIRFLOW-3132] (PR #3977)
> 
>
> Key: AIRFLOW-3301
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3301
> Project: Apache Airflow
>  Issue Type: Test
>  Components: tests
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
>
> In PR [https://github.com/apache/incubator-airflow/pull/3977,] test is not 
> updated accordingly, and it results in CI failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3364) Change Ooops Exception to something meanningfull

2018-11-19 Thread jack (JIRA)
jack created AIRFLOW-3364:
-

 Summary: Change Ooops Exception to something meanningfull 
 Key: AIRFLOW-3364
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3364
 Project: Apache Airflow
  Issue Type: Improvement
Affects Versions: 1.9.0
Reporter: jack
 Attachments: oops.PNG

When I try to manually change tasks state (from up_for_retry) to success or 
whatever.

I often get:
h1.  
{code:java}
Ooops.  Traceback (most recent call last): File 
"/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1988, in wsgi_app 
response = self.full_dispatch_request() File 
"/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1641, in 
full_dispatch_request rv = self.handle_user_exception(e) File 
"/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1544, in 
handle_user_exception reraise(exc_type, exc_value, tb) File 
"/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1639, in 
full_dispatch_request rv = self.dispatch_request() File 
"/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1625, in 
dispatch_request return self.view_functions[rule.endpoint](**req.view_args) 
File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 69, in 
inner return self._run_view(f, *args, **kwargs) File 
"/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 368, in 
_run_view return fn(self, *args, **kwargs) File 
"/usr/local/lib/python2.7/dist-packages/flask_admin/model/base.py", line 2068, 
in action_view return self.handle_action() File 
"/usr/local/lib/python2.7/dist-packages/flask_admin/actions.py", line 113, in 
handle_action response = handler[0](ids) File 
"/usr/local/lib/python2.7/dist-packages/airflow/www/views.py", line 2502, in 
action_set_success self.set_task_instance_state(ids, State.SUCCESS) File 
"/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 50, in 
wrapper result = func(*args, **kwargs) File 
"/usr/local/lib/python2.7/dist-packages/airflow/www/views.py", line 2540, in 
set_task_instance_state raise Exception("Ooops") Exception: Ooops{code}
 

 

It's unclear what this oops is and why it's needed.

If there is exception it should shown the actual error.

 

Usually I simply delete the task to solve this. 
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2775) Recent Tasks Label Misleading

2018-11-18 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690833#comment-16690833
 ] 

jack commented on AIRFLOW-2775:
---

isn't it better to just say "tasks from the most recent run" ? If it's active 
or not the status of the task will tell. (If it's light green or dark green)

> Recent Tasks Label Misleading
> -
>
> Key: AIRFLOW-2775
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2775
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Affects Versions: 1.9.0
>Reporter: Matthias Niehoff
>Priority: Major
> Attachments: recent-tasks.png
>
>
> The label for the Recent Tasks in the DAGs UI is misleading. 
> The mouse over label says: "Status of tasks from all active DAG runs or, if 
> not currently active, from most recent run."
> While the "not currently active" part is correct, the active DAG run is 
> incorrect. Shown are the status of the task from all active DAG runs plus the 
> tasks from the most recent run". When the run finishes the task from the 
> previous run are removed from the view and only the tasks of the most recent 
> run are shown.
> Either the label should be updated to reflect his
> or
> Only the tasks of the current run will be shown, without the task of the last 
> finished run 
>  
> Imho the second options makes more sense.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3350) docs - explain how to use bitshift with lists

2018-11-14 Thread jack (JIRA)
jack created AIRFLOW-3350:
-

 Summary: docs - explain how to use bitshift with lists
 Key: AIRFLOW-3350
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3350
 Project: Apache Airflow
  Issue Type: Improvement
  Components: docs
Affects Versions: 1.10.0
Reporter: jack
Assignee: jack
 Fix For: 2.0.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3337) "About" page version info is not available

2018-11-13 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686141#comment-16686141
 ] 

jack commented on AIRFLOW-3337:
---

I think this was fixed by 
[https://github.com/apache/incubator-airflow/pull/4072]  upgrading to 1.10.1 
once released should solve the issue.

[~kaxilnaik] can you check it?

> "About" page version info is not available
> --
>
> Key: AIRFLOW-3337
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3337
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Dmytro Kulyk
>Priority: Minor
> Attachments: image-2018-11-14-01-00-58-743.png
>
>
> From the Airflow 1.10.0 ui, click about and the resulting page shows version 
> and git version as "Not available"
> Version has been upgraded from 1.9 over 
> {code}
> pip install apache-airflow=1.10.0
> {code}
>   !image-2018-11-14-01-00-58-743.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3278) update operators uploading to GCS to support gzip flag

2018-11-08 Thread jack (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jack updated AIRFLOW-3278:
--
Description: 
gzip flag was added to hook by PR:

[https://issues.apache.org/jira/browse/AIRFLOW-2932|https://github.com/apache/incubator-airflow/pull/3893]

 

*Needs to modify following operators to support the gzip flag:*

S3ToGoogleCloudStorageOperator

PostgresToGoogleCloudStorageOperator

CassandraToGoogleCloudStorageOperator

MySqlToGoogleCloudStorageOperator

GoogleCloudStorageToGoogleCloudStorageOperator  (can be used to convert non 
gzip file to gzip)

 

*Operators that already support the gzip flag:*

FileToGoogleCloudStorageOperator

 

*Others:*

BigQueryToCloudStorageOperator  - has separated compression flag

  was:
gzip flag was added to hook by PR:

[https://issues.apache.org/jira/browse/AIRFLOW-2932|https://github.com/apache/incubator-airflow/pull/3893]

 

*Needs to modify following operators to support the gzip flag:*

FileToGoogleCloudStorageOperator

S3ToGoogleCloudStorageOperator

PostgresToGoogleCloudStorageOperator

CassandraToGoogleCloudStorageOperator

MySqlToGoogleCloudStorageOperator

GoogleCloudStorageToGoogleCloudStorageOperator  (can be used to convert non 
gzip file to gzip)

 

*Operators that already support the gzip flag:*

FileToGoogleCloudStorageOperator

 

*Others:*

BigQueryToCloudStorageOperator  - has separated compression flag


> update operators uploading to GCS to support gzip flag
> --
>
> Key: AIRFLOW-3278
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3278
> Project: Apache Airflow
>  Issue Type: Task
>Affects Versions: 1.10.0
>Reporter: jack
>Priority: Major
>
> gzip flag was added to hook by PR:
> [https://issues.apache.org/jira/browse/AIRFLOW-2932|https://github.com/apache/incubator-airflow/pull/3893]
>  
> *Needs to modify following operators to support the gzip flag:*
> S3ToGoogleCloudStorageOperator
> PostgresToGoogleCloudStorageOperator
> CassandraToGoogleCloudStorageOperator
> MySqlToGoogleCloudStorageOperator
> GoogleCloudStorageToGoogleCloudStorageOperator  (can be used to convert non 
> gzip file to gzip)
>  
> *Operators that already support the gzip flag:*
> FileToGoogleCloudStorageOperator
>  
> *Others:*
> BigQueryToCloudStorageOperator  - has separated compression flag



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3182) 'all_done' trigger rule works incorrectly with BranchPythonOperator upstream tasks

2018-11-08 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679567#comment-16679567
 ] 

jack commented on AIRFLOW-3182:
---

Could the trigger rule you set be wrong?

I think you should do:
run_aggregation = PythonOperator(
task_id='daily_aggregation',
python_callable=run_daily_aggregation,
provide_context=True,
trigger_rule=TriggerRule.ALL_SUCCESS,
dag=dag
 

This means that daily_aggregation will start only when : start, hour_branch & 
task_for_hour-23 are success.

 

In your example daily_aggregation will start when start, hour_branch & 
task_for_hour-23 are done. and SKIP consider to be done. so it make scene it 
run every hour.

> 'all_done' trigger rule works incorrectly with BranchPythonOperator upstream 
> tasks
> --
>
> Key: AIRFLOW-3182
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3182
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Greg H
>Priority: Major
> Attachments: BrannchPythonOperator.png
>
>
> We have a job that runs some data processing every hour. At the end of the 
> day we need to run aggregation on all data generated by the 'hourly' jobs, 
> regardless if any 'hourly' job failed or not. For this purpose we have 
> prepared DAG that uses BranchPythonOperator in order to decide which 'hourly' 
> job needs to be run in given time and when task for hour 23 is done, we 
> trigger the aggregation (downstream). For this to work regardless of the last 
> 'hourly' task status the *'all_done'* trigger rule is set in the aggregation 
> task. Unfortunately, such configuration works incorrectly causing aggregation 
> task to be run after every 'hourly' task, despite the fact the aggregation 
> task is set as downstream for 'task_for_hour-23' +only+:
>   !BrannchPythonOperator.png!
> Here's sample code:
> {code:java}
> # coding: utf-8
> from airflow import DAG
> from airflow.operators.python_operator import PythonOperator
> from airflow.operators.python_operator import BranchPythonOperator
> from airflow.operators.dummy_operator import DummyOperator
> from airflow.models import TriggerRule
> from datetime import datetime
> import logging
> dag_id = 'test'
> today = datetime.today().strftime("%Y-%m-%d");
> task_prefix = 'task_for_hour-'
> default_args = {
> 'owner': 'airflow',
> 'depends_on_past': False,
> 'start_date': datetime(2018, 6, 18),
> 'catchup': False,
> }
> dag = DAG(
> dag_id=dag_id,
> default_args=default_args,
> schedule_interval="@hourly",
> catchup=False
> )
> # Setting the current hour
> def get_current_hour():
> return datetime.now().hour
> # Returns the name id of the task to launch next (task_for_hour-0, 
> task_for_hour-1, etc.)
> def branch():
> return task_prefix + str(get_current_hour())
> # Running hourly job
> def run_hourly_job(**kwargs):
> current_hour = get_current_hour()
> logging.info("Running job for hour: %s" % current_hour)
> # Main daily aggregation
> def run_daily_aggregation(**kwargs):
> logging.info("Running daily aggregation for %s" % today)
> 
> start_task = DummyOperator(
> task_id='start',
> dag=dag
> )
> # 'branch' method returns name of the task to be run next.
> hour_branching = BranchPythonOperator(
> task_id='hour_branching',
> python_callable=branch,
> dag=dag)
> run_aggregation = PythonOperator(
> task_id='daily_aggregation',
> python_callable=run_daily_aggregation,
> provide_context=True,
> trigger_rule=TriggerRule.ALL_DONE,
> dag=dag
> )
> start_task.set_downstream(hour_branching)
> # Create tasks for each hour
> for hour in range(24):
> if hour == 23:
> task_for_hour_23 = PythonOperator(
> task_id=task_prefix + '23',
> python_callable=run_hourly_job,
> provide_context=True,
> dag=dag
> )
> hour_branching.set_downstream(task_for_hour_23)
> task_for_hour_23.set_downstream(run_aggregation)
> else:
> hour_branching.set_downstream(PythonOperator(
> task_id=task_prefix + str(hour),
> python_callable=run_hourly_job,
> provide_context=True,
> dag=dag)
> )
> {code}
> This me be also related to AIRFLOW-1419



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3309) Missing Mongo DB connection type

2018-11-07 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678283#comment-16678283
 ] 

jack commented on AIRFLOW-3309:
---

This should have been covered by 
https://issues.apache.org/jira/browse/AIRFLOW-83  isn't it?

> Missing Mongo DB connection type
> 
>
> Key: AIRFLOW-3309
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3309
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: database
>Affects Versions: 1.10.0
>Reporter: John Cheng
>Assignee: John Cheng
>Priority: Minor
> Fix For: 1.10.1
>
>
> Unable to choose Mongo DB on the admin console connection page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-2022) Web GUI paged entries do not show Last Run and DAG Runs

2018-11-07 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678241#comment-16678241
 ] 

jack edited comment on AIRFLOW-2022 at 11/7/18 1:50 PM:


can't reproduce this.

Might be solved already.

Are you experiencing this with master or with 1.8.2?


was (Author: jackjack10):
can't reproduce this.

Might be solved already.

> Web GUI paged entries do not show Last Run and DAG Runs
> ---
>
> Key: AIRFLOW-2022
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2022
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.8.2
>Reporter: Yee Ting Li
>Priority: Minor
>
> i've gone past the 25 dags that fit onto one single page in the web frontend 
> view.
> however, when i page to the page 2, my DAGs do not show the last run nor the 
> DAG Runs column data.
> if i switch to 'Show 50 entries', i still do not see these columns on the 
> items 26 and above.
> the Recent Tasks appear to work fine however.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2022) Web GUI paged entries do not show Last Run and DAG Runs

2018-11-07 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678241#comment-16678241
 ] 

jack commented on AIRFLOW-2022:
---

can't reproduce this.

Might be solved already.

> Web GUI paged entries do not show Last Run and DAG Runs
> ---
>
> Key: AIRFLOW-2022
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2022
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.8.2
>Reporter: Yee Ting Li
>Priority: Minor
>
> i've gone past the 25 dags that fit onto one single page in the web frontend 
> view.
> however, when i page to the page 2, my DAGs do not show the last run nor the 
> DAG Runs column data.
> if i switch to 'Show 50 entries', i still do not see these columns on the 
> items 26 and above.
> the Recent Tasks appear to work fine however.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-284) HiveHook Cursor Scope Persistency

2018-11-05 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675118#comment-16675118
 ] 

jack commented on AIRFLOW-284:
--

This was fixed:

https://github.com/apache/incubator-airflow/pull/1629

> HiveHook Cursor Scope Persistency
> -
>
> Key: AIRFLOW-284
> URL: https://issues.apache.org/jira/browse/AIRFLOW-284
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hive_hooks, hooks
>Affects Versions: 2.0.0
> Environment: Apache
>Reporter: Sherwain Williamson
>Assignee: Sherwain Williamson
>Priority: Major
> Fix For: 2.0.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The function `get_results` implemented in the `HiveServer2Hook` does not 
> execute multiple commands passed to it in a list, in the singular cursor 
> scope. This has caused SQL statements that depend on the execution of add 
> `jar` and `set` commands to fail as they are being executed in different 
> cursor scopes which are not persistent.
> The code has been updated to have the cursor object persistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3062) Add Qubole in integration docs

2018-11-05 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675109#comment-16675109
 ] 

jack commented on AIRFLOW-3062:
---

This can be closed. it was merged.

> Add Qubole in integration docs
> --
>
> Key: AIRFLOW-3062
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3062
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Sumit Maheshwari
>Assignee: Sumit Maheshwari
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2323) Should we replace the librabbitmq with other library in setup.py for Apache Airflow 1.9+?

2018-11-01 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671676#comment-16671676
 ] 

jack commented on AIRFLOW-2323:
---

According to 
[{color:#0066cc}https://github.com/celery/librabbitmq/blob/master/Changelog{color}]

librabbitmq 2.0.0 supports Python {color:#24292e}3.4, 3.5, 3.6{color}

> Should we replace the librabbitmq with other library in setup.py for Apache 
> Airflow 1.9+?
> -
>
> Key: AIRFLOW-2323
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2323
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: A.Quasimodo
>Priority: Major
>
> As we know, latest librabbitmq is still can't support Python3,so, when I 
> executed the command *pip install apache-airflow[rabbitmq]*, some errors 
> happened.
> So, should we replace the librabbitmq with other libraries like 
> amqplib,py-amqp,.etc?
> Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (AIRFLOW-2323) Should we replace the librabbitmq with other library in setup.py for Apache Airflow 1.9+?

2018-11-01 Thread jack (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jack updated AIRFLOW-2323:
--
Comment: was deleted

(was: According to [https://github.com/celery/librabbitmq/blob/master/Changelog]

librabbitmq 2.0.0 supports Python {color:#24292e}3.4, 3.5, 3.6{color})

> Should we replace the librabbitmq with other library in setup.py for Apache 
> Airflow 1.9+?
> -
>
> Key: AIRFLOW-2323
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2323
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: A.Quasimodo
>Priority: Major
>
> As we know, latest librabbitmq is still can't support Python3,so, when I 
> executed the command *pip install apache-airflow[rabbitmq]*, some errors 
> happened.
> So, should we replace the librabbitmq with other libraries like 
> amqplib,py-amqp,.etc?
> Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-2323) Should we replace the librabbitmq with other library in setup.py for Apache Airflow 1.9+?

2018-11-01 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658901#comment-16658901
 ] 

jack edited comment on AIRFLOW-2323 at 11/1/18 2:27 PM:


According to [https://github.com/celery/librabbitmq/blob/master/Changelog]

librabbitmq 2.0.0 supports Python {color:#24292e}3.4, 3.5, 3.6{color}


was (Author: jackjack10):
It doesn't seems like the librabbitmq lib is going to fix the problems. It's 
barely maintained.

> Should we replace the librabbitmq with other library in setup.py for Apache 
> Airflow 1.9+?
> -
>
> Key: AIRFLOW-2323
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2323
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: A.Quasimodo
>Priority: Major
>
> As we know, latest librabbitmq is still can't support Python3,so, when I 
> executed the command *pip install apache-airflow[rabbitmq]*, some errors 
> happened.
> So, should we replace the librabbitmq with other libraries like 
> amqplib,py-amqp,.etc?
> Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2837) tenacity 4.8.0 breaks with python3.7

2018-11-01 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671657#comment-16671657
 ] 

jack commented on AIRFLOW-2837:
---

Duplicate of https://issues.apache.org/jira/browse/AIRFLOW-2876

> tenacity 4.8.0 breaks with python3.7
> 
>
> Key: AIRFLOW-2837
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2837
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Adrian Bridgett
>Priority: Minor
>
> Tenacity 4.8.0 (as in setup.py) uses the reserved async keyword.
> Tenacity seems to lack a changelog, 4.12.0 seems to fix the problem but I 
> don't know what breaking changes may have occurred. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1068) Autocommit error with Airflow v1.7.1.3 and pymssql v2.1.3

2018-11-01 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671615#comment-16671615
 ] 

jack commented on AIRFLOW-1068:
---

Are you referring to this ?

https://github.com/pymssql/pymssql/commit/dd42d5e962cd42eeb8e6ce24a0a09135e92ef990

> Autocommit error with Airflow v1.7.1.3 and pymssql v2.1.3
> -
>
> Key: AIRFLOW-1068
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1068
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db, hooks
>Affects Versions: 1.7.1.3
> Environment: OS: Centos 6
> pymssql: 2.1.3
>Reporter: Thomas Christie
>Priority: Major
>
> After upgrading pymssql started getting the following error when trying to 
> use the MSSQL hook.
> [2017-04-04 13:02:27,260] {models.py:1286} ERROR - 'pymssql.Connection' 
> object attribute 'autocommit' is read-only
> Traceback (most recent call last):
>   File 
> "/home/jobrunner/.virtualenvs/airflow/lib/python3.5/site-packages/airflow/models.py",
>  line 1245, in run
> result = task_copy.execute(context=context)
>   File 
> "/home/jobrunner/.virtualenvs/airflow/lib/python3.5/site-packages/airflow/operators/mssql_operator.py",
>  line 34, in execute
> hook.run(self.sql, parameters=self.parameters)
>   File 
> "/home/jobrunner/.virtualenvs/airflow/lib/python3.5/site-packages/airflow/hooks/dbapi_hook.py",
>  line 124, in run
> self.set_autocommit(conn, autocommit)
>   File 
> "/home/jobrunner/.virtualenvs/airflow/lib/python3.5/site-packages/airflow/hooks/dbapi_hook.py",
>  line 138, in set_autocommit
> conn.autocommit = autocommit
> AttributeError: 'pymssql.Connection' object attribute 'autocommit' is 
> read-only
> I looked at the dbapi_hook.py file and this is the offending line:
> def set_autocommit(self, conn, autocommit):
> conn.autocommit = autocommit
> Changing the line to:
> conn.autocommit(autocommit)
> seems to work.  From what I understand, autocommit was a getter/setter method 
> in pymssql versions <2.0.0.  Maybe they've reverted the behavior?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-2941) example_http_operator.py Python 3.7 invalid syntax

2018-11-01 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671594#comment-16671594
 ] 

jack edited comment on AIRFLOW-2941 at 11/1/18 1:06 PM:


Should link this Jira to PR: 
[https://github.com/apache/incubator-airflow/pull/3723] 


was (Author: jackjack10):
Should like this Jira to PR: 
[https://github.com/apache/incubator-airflow/pull/3723] 

> example_http_operator.py Python 3.7 invalid syntax
> --
>
> Key: AIRFLOW-2941
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2941
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Jon Davies
>Priority: Major
>
> example_http_operator.py fails on Python 3.7 with:
> {code:java}
> [2018-08-23 08:45:26,827] {models.py:365} ERROR - Failed to import: 
> /usr/local/lib/python3.7/site-packages/airflow/example_dags/example_http_operator.py
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.7/site-packages/airflow/models.py", line 362, 
> in process_file
> m = imp.load_source(mod_name, filepath)
>   File "/usr/local/lib/python3.7/imp.py", line 172, in load_source
> module = _load(spec)
>   File "", line 696, in _load
>   File "", line 677, in _load_unlocked
>   File "", line 728, in exec_module
>   File "", line 219, in _call_with_frames_removed
>   File 
> "/usr/local/lib/python3.7/site-packages/airflow/example_dags/example_http_operator.py",
>  line 27, in 
> from airflow.operators.http_operator import SimpleHttpOperator
>   File 
> "/usr/local/lib/python3.7/site-packages/airflow/operators/http_operator.py", 
> line 21, in 
> from airflow.hooks.http_hook import HttpHook
>   File "/usr/local/lib/python3.7/site-packages/airflow/hooks/http_hook.py", 
> line 23, in 
> import tenacity
>   File "/usr/local/lib/python3.7/site-packages/tenacity/__init__.py", line 352
> from tenacity.async import AsyncRetrying
>   ^
> SyntaxError: invalid syntax
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2941) example_http_operator.py Python 3.7 invalid syntax

2018-11-01 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671594#comment-16671594
 ] 

jack commented on AIRFLOW-2941:
---

Should like this Jira to PR: 
[https://github.com/apache/incubator-airflow/pull/3723] 

> example_http_operator.py Python 3.7 invalid syntax
> --
>
> Key: AIRFLOW-2941
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2941
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Jon Davies
>Priority: Major
>
> example_http_operator.py fails on Python 3.7 with:
> {code:java}
> [2018-08-23 08:45:26,827] {models.py:365} ERROR - Failed to import: 
> /usr/local/lib/python3.7/site-packages/airflow/example_dags/example_http_operator.py
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.7/site-packages/airflow/models.py", line 362, 
> in process_file
> m = imp.load_source(mod_name, filepath)
>   File "/usr/local/lib/python3.7/imp.py", line 172, in load_source
> module = _load(spec)
>   File "", line 696, in _load
>   File "", line 677, in _load_unlocked
>   File "", line 728, in exec_module
>   File "", line 219, in _call_with_frames_removed
>   File 
> "/usr/local/lib/python3.7/site-packages/airflow/example_dags/example_http_operator.py",
>  line 27, in 
> from airflow.operators.http_operator import SimpleHttpOperator
>   File 
> "/usr/local/lib/python3.7/site-packages/airflow/operators/http_operator.py", 
> line 21, in 
> from airflow.hooks.http_hook import HttpHook
>   File "/usr/local/lib/python3.7/site-packages/airflow/hooks/http_hook.py", 
> line 23, in 
> import tenacity
>   File "/usr/local/lib/python3.7/site-packages/tenacity/__init__.py", line 352
> from tenacity.async import AsyncRetrying
>   ^
> SyntaxError: invalid syntax
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2764) Description for dags

2018-11-01 Thread jack (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jack updated AIRFLOW-2764:
--
Summary: Description for dags  (was: Discription for dags)

> Description for dags
> 
>
> Key: AIRFLOW-2764
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2764
> Project: Apache Airflow
>  Issue Type: Wish
>Reporter: jack
>Priority: Minor
> Fix For: 2.0.0
>
>
> Currently in the UI we see the dag_id of the DAG.
> When many users use the UI it's not always clear what the DAG does.
> It will be extremely helpful if in the UI we can see a short description of 
> the DAG.
>  Example:
> Dag   Description  Schedule 
>  Report    this is the description  0 4 ***



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3278) update operators uploading to GCS to support gzip flag

2018-10-31 Thread jack (JIRA)
jack created AIRFLOW-3278:
-

 Summary: update operators uploading to GCS to support gzip flag
 Key: AIRFLOW-3278
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3278
 Project: Apache Airflow
  Issue Type: Task
Affects Versions: 1.10.0
Reporter: jack


gzip flag was added to hook by PR:

[https://issues.apache.org/jira/browse/AIRFLOW-2932|https://github.com/apache/incubator-airflow/pull/3893]

 

*Needs to modify following operators to support the gzip flag:*

FileToGoogleCloudStorageOperator

S3ToGoogleCloudStorageOperator

PostgresToGoogleCloudStorageOperator

CassandraToGoogleCloudStorageOperator

MySqlToGoogleCloudStorageOperator

GoogleCloudStorageToGoogleCloudStorageOperator  (can be used to convert non 
gzip file to gzip)

 

*Operators that already support the gzip flag:*

FileToGoogleCloudStorageOperator

 

*Others:*

BigQueryToCloudStorageOperator  - has separated compression flag



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2644) Allow wild-cards in the search box in the UI

2018-10-31 Thread jack (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jack updated AIRFLOW-2644:
--
Summary: Allow wild-cards in the search box in the UI  (was: Allow 
whild-cards in the search box in the UI)

> Allow wild-cards in the search box in the UI
> 
>
> Key: AIRFLOW-2644
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2644
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: 1.9.0, 1.10.0, 2.0.0
>Reporter: jack
>Priority: Major
>
> In the UI there is a search box.
> If you search DAG name you will see the result for the search as you type.
> Please allow support of wild-cards. Mainly for : *
>  
> So if I have a Dag called :abcd and I'm searching for ab* I will see it in 
> the list.
>  
> This is very helpful for systems with 100+ dags.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2541) Ability to discover custom plugins, operators, sensors, etc. from various locations

2018-10-31 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669870#comment-16669870
 ] 

jack commented on AIRFLOW-2541:
---

[~riteshshrv]

Why not submit your code suggestion as a PR?

> Ability to discover custom plugins, operators, sensors, etc. from various 
> locations
> ---
>
> Key: AIRFLOW-2541
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2541
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: plugins
>Reporter: Ritesh Shrivastav
>Assignee: Ritesh Shrivastav
>Priority: Minor
>  Labels: newbie, patch-available
> Attachments: plugin_manager.diff
>
>
> Provision to create custom plugins without dropping them into 
> `$AIRFLOW_HOME/plugins` (or the directory defined in `airflow.cfg`). 
> We can define one location in `airflow.cfg` but if we have multiple projects 
> which will have their own workflows so, it would be ideal to implement custom 
> plugins (along with operators, sensors, etc.) in those repositories itself 
> and let them get discovered from there.
>  
> A simple patch in attachment seems to solve this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3269) mysql_hook replace depricated lib MySQLdb with PyMySQL

2018-10-29 Thread jack (JIRA)
jack created AIRFLOW-3269:
-

 Summary: mysql_hook replace depricated lib MySQLdb with PyMySQL
 Key: AIRFLOW-3269
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3269
 Project: Apache Airflow
  Issue Type: Task
Affects Versions: 1.10.0
Reporter: jack


mysql_hook uses  {color:#24292e}MySQLdb library.{color}

{color:#24292e}This library is deprecated and not being maintained since 
2014.{color}

{color:#24292e}The library also does not work properly with new MySQL 
databases.{color}

{color:#24292e}We should be switching to PyMySQL package.{color}

{color:#24292e}[https://github.com/PyMySQL/PyMySQL]{color}

{color:#24292e}The required modifications are relatively small.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1533) differing arrow's color between tasks in graph view based on trigger rule

2018-10-29 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667108#comment-16667108
 ] 

jack commented on AIRFLOW-1533:
---

This is a nice idea

> differing arrow's color between tasks in graph view based on trigger rule
> -
>
> Key: AIRFLOW-1533
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1533
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Reporter: Yu Ishikawa
>Priority: Major
>
> h2. Proposal 
> How about differing arrow's color between tasks based on trigger rule.
> In a graph view, we can't identify trigger rules among tasks at a glance. We 
> must see the code view to understand them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2999) S3_hook - add the ability to download file to local disk

2018-10-29 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667096#comment-16667096
 ] 

jack commented on AIRFLOW-2999:
---

[~wileeam] I saw you assigned yourself on

[https://issues.apache.org/jira/projects/AIRFLOW/issues/AIRFLOW-2332]  which is 
duplicates of this ticket.

Are you working on this?

 

 

> S3_hook  - add the ability to download file to local disk
> -
>
> Key: AIRFLOW-2999
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2999
> Project: Apache Airflow
>  Issue Type: Task
>Affects Versions: 1.10.0
>Reporter: jack
>Priority: Major
>
> The [S3_hook 
> |https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/S3_hook.py#L177]
>  has get_key method that returns boto3.s3.Object it also has load_file method 
> which loads file from local file system to S3.
>  
> What it doesn't have is a method to download a file from S3 to the local file 
> system.
> Basicly it should be something very simple... an extention to the get_key 
> method with parameter to the destination on local file system adding a code 
> for taking the boto3.s3.Object and save it on the disk.  Note: that it can be 
> more than 1 file if the user choose a folder in S3.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (AIRFLOW-2999) S3_hook - add the ability to download file to local disk

2018-10-29 Thread jack (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jack updated AIRFLOW-2999:
--
Comment: was deleted

(was: [~wileeam] I saw you assigned yourself on

[https://issues.apache.org/jira/projects/AIRFLOW/issues/AIRFLOW-2332]  which is 
duplicates of this ticket.

Are you working on this?

 

 )

> S3_hook  - add the ability to download file to local disk
> -
>
> Key: AIRFLOW-2999
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2999
> Project: Apache Airflow
>  Issue Type: Task
>Affects Versions: 1.10.0
>Reporter: jack
>Priority: Major
>
> The [S3_hook 
> |https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/S3_hook.py#L177]
>  has get_key method that returns boto3.s3.Object it also has load_file method 
> which loads file from local file system to S3.
>  
> What it doesn't have is a method to download a file from S3 to the local file 
> system.
> Basicly it should be something very simple... an extention to the get_key 
> method with parameter to the destination on local file system adding a code 
> for taking the boto3.s3.Object and save it on the disk.  Note: that it can be 
> more than 1 file if the user choose a folder in S3.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2999) S3_hook - add the ability to download file to local disk

2018-10-29 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667093#comment-16667093
 ] 

jack commented on AIRFLOW-2999:
---

[~wileeam] I saw you assigned yourself on

[https://issues.apache.org/jira/projects/AIRFLOW/issues/AIRFLOW-2332]  which is 
duplicates of this ticket.

Are you working on this?

 

 

> S3_hook  - add the ability to download file to local disk
> -
>
> Key: AIRFLOW-2999
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2999
> Project: Apache Airflow
>  Issue Type: Task
>Affects Versions: 1.10.0
>Reporter: jack
>Priority: Major
>
> The [S3_hook 
> |https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/S3_hook.py#L177]
>  has get_key method that returns boto3.s3.Object it also has load_file method 
> which loads file from local file system to S3.
>  
> What it doesn't have is a method to download a file from S3 to the local file 
> system.
> Basicly it should be something very simple... an extention to the get_key 
> method with parameter to the destination on local file system adding a code 
> for taking the boto3.s3.Object and save it on the disk.  Note: that it can be 
> more than 1 file if the user choose a folder in S3.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2679) GoogleCloudStorageToBigQueryOperator to support MERGE

2018-10-29 Thread jack (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jack updated AIRFLOW-2679:
--
Description: 
Currently the 
{color:#22}GoogleCloudStorageToBigQueryOp{color}{color:#22}erator 
support the write_disposition parameter which can be : WRITE_TRUNCATE, 
WRITE_APPEND , WRITE_EMPTY{color}

 

{color:#22}However Google has another very useful writing method 
MERGE:{color}

{color:#22}[https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_examples]{color}

{color:#22}{color:#22}Support MERGE statement will be extremely 
useful.{color}{color}

{color:#22}{color:#22}The idea  behind this request is to do it 
directly from Google Storage file rather than load the file into a table and 
then run another MERGE statement.{color}{color}

 

{color:#22}{color:#22}The MERGE statement is really helpful when one 
wants his records to be updated rather than appended or replaced. {color}{color}

 

{color:#22} {color}

  was:
Currently the 
{color:#22}GoogleCloudStorageToBigQueryOp{color}{color:#22}erator 
support incremental load using 
*{color:#404040}max_id_key{color}*{color:#404040} {color}.{color}

 

{color:#22}However many systems actually needs "UPSERT" in terms of - if 
row exists update it, if not insert/copy it.{color}

{color:#22}Currently the operator assumes that we only need to insert new 
data, it can't handle update of data. Most of the time data is not static it 
changes with time. Yesterday order status was NEW today it's Processing, 
tomorrow it's SENT in a month it will be REFUNDED etc... {color}

 

{color:#22} {color}

Summary: GoogleCloudStorageToBigQueryOperator to support MERGE  (was: 
GoogleCloudStorageToBigQueryOperator to support UPSERT)

> GoogleCloudStorageToBigQueryOperator to support MERGE
> -
>
> Key: AIRFLOW-2679
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2679
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: jack
>Priority: Major
>
> Currently the 
> {color:#22}GoogleCloudStorageToBigQueryOp{color}{color:#22}erator 
> support the write_disposition parameter which can be : WRITE_TRUNCATE, 
> WRITE_APPEND , WRITE_EMPTY{color}
>  
> {color:#22}However Google has another very useful writing method 
> MERGE:{color}
> {color:#22}[https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_examples]{color}
> {color:#22}{color:#22}Support MERGE statement will be extremely 
> useful.{color}{color}
> {color:#22}{color:#22}The idea  behind this request is to do it 
> directly from Google Storage file rather than load the file into a table and 
> then run another MERGE statement.{color}{color}
>  
> {color:#22}{color:#22}The MERGE statement is really helpful when one 
> wants his records to be updated rather than appended or replaced. 
> {color}{color}
>  
> {color:#22} {color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1079) Dag is still visible in ui, after removing it from the dag folder

2018-10-28 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1347#comment-1347
 ] 

jack commented on AIRFLOW-1079:
---

Deleting the DAG file manually doesn't delete it from the database so it still 
appear in the UI.

You should upgrade to version 1.10.0- a delete DAG option was introduced in the 
UI.

> Dag is still visible in ui, after removing it from the dag folder
> -
>
> Key: AIRFLOW-1079
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1079
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Affects Versions: 1.8.0
>Reporter: Arun M J
>Priority: Major
>
> Even after removing a dag from the folder, it is shown in ui.
> can we set  `is_active` in table `dag`  for those dags while filling the 
> dag-bags to resolve this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-351) Failed to clear downstream tasks

2018-10-28 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1344#comment-1344
 ] 

jack commented on AIRFLOW-351:
--

There seems to be an open (old) PR for this which isn't listed in the ticket or 
in the discussion:

https://github.com/apache/incubator-airflow/pull/2543

> Failed to clear downstream tasks
> 
>
> Key: AIRFLOW-351
> URL: https://issues.apache.org/jira/browse/AIRFLOW-351
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: models, subdag, webserver
>Affects Versions: 1.7.1.3
>Reporter: Adinata
>Priority: Major
> Attachments: dag_error.py, error.log, error_on_clear_dag.txt, 
> ubuntu-14-packages.log, ubuntu-16-oops.log, ubuntu-16-packages.log
>
>
> {code}
>   / (  ()   )  \___
>  /( (  (  )   _))  )   )\
>(( (   )()  )   (   )  )
>  ((/  ( _(   )   (   _) ) (  () )  )
> ( (  ( (_)   (((   )  .((_ ) .  )_
>( (  )(  (  ))   ) . ) (   )
>   (  (   (  (   ) (  _  ( _) ).  ) . ) ) ( )
>   ( (  (   ) (  )   (  )) ) _)(   )  )  )
>  ( (  ( \ ) ((_  ( ) ( )  )   ) )  )) ( )
>   (  (   (  (   (_ ( ) ( _)  ) (  )  )   )
>  ( (  ( (  (  ) (_  )  ) )  _)   ) _( ( )
>   ((  (   )(( _)   _) _(_ (  (_ )
>(_((__(_(__(( ( ( |  ) ) ) )_))__))_)___)
>((__)\\||lll|l||///  \_))
> (   /(/ (  )  ) )\   )
>   (( ( ( | | ) ) )\   )
>(   /(| / ( )) ) ) )) )
>  ( ( _(|)_) )
>   (  ||\(|(|)|/|| )
> (|(||(||))
>   ( //|/l|||)|\\ \ )
> (/ / //  /|//\\  \ \  \ _)
> ---
> Node: 9889a7c79e9b
> ---
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1817, in 
> wsgi_app
> response = self.full_dispatch_request()
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1477, in 
> full_dispatch_request
> rv = self.handle_user_exception(e)
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1381, in 
> handle_user_exception
> reraise(exc_type, exc_value, tb)
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1475, in 
> full_dispatch_request
> rv = self.dispatch_request()
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1461, in 
> dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
>   File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 68, 
> in inner
> return self._run_view(f, *args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 
> 367, in _run_view
> return fn(self, *args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/flask_login.py", line 755, in 
> decorated_view
> return func(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/www/utils.py", line 
> 118, in wrapper
> return f(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/www/utils.py", line 
> 167, in wrapper
> return f(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/www/views.py", line 
> 1017, in clear
> include_upstream=upstream)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 2870, 
> in sub_dag
> dag = copy.deepcopy(self)
>   File "/usr/lib/python2.7/copy.py", line 174, in deepcopy
> y = copier(memo)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 2856, 
> in __deepcopy__
> setattr(result, k, copy.deepcopy(v, memo))
>   File "/usr/lib/python2.7/copy.py", line 163, in deepcopy
> y = copier(x, memo)
>   File "/usr/lib/python2.7/copy.py", line 257, in _deepcopy_dict
> y[deepcopy(key, memo)] = deepcopy(value, memo)
>   File "/usr/lib/python2.7/copy.py", line 174, in deepcopy
> y = copier(memo)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 1974, 
> in __deepcopy__
> setattr(result, k, copy.deepcopy(v, memo))
>   File "/usr/lib/python2.7/copy.py", line 190, in deepcopy
> y = _reconstruct(x, rv, 1, memo)
>   File "/usr/lib/python2.7/copy.py", line 334, in _reconstruct
> state = deepcopy(state, memo)
>   File "/usr/lib/python2.7/copy.py", line 

[jira] [Commented] (AIRFLOW-2925) gcp dataflow hook doesn't show traceback

2018-10-28 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1334#comment-1334
 ] 

jack commented on AIRFLOW-2925:
---

[~xnuinside] correct me if I'm wrong but when raise Exception("DataFlow failed 
with return code {}".format(self._proc.returncode))  is executed. You should 
see in the log a line with: DataFlow failed with return code  VALUE

I see no such rows in the logs which means that the code you executed never 
reached this exception line.

> gcp dataflow hook doesn't show traceback
> 
>
> Key: AIRFLOW-2925
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2925
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: jack
>Priority: Major
>  Labels: easyfix
> Attachments: Screen Shot 2018-10-23 at 1.40.43 PM.png
>
>
> The gcp_dataflow_hook.py has:
>  
> {code:java}
> if self._proc.returncode is not 0:   
> raise Exception("DataFlow failed with return code 
> {}".format(self._proc.returncode))
> {code}
>  
> This does not show the full trace of the error which makes it harder to 
> understand the problem.
> [https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/gcp_dataflow_hook.py#L171]
>  
>  
> reported on gitter by Oscar Carlsson



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3254) BigQueryGetDataOperator to support reading query from SQL file

2018-10-24 Thread jack (JIRA)
jack created AIRFLOW-3254:
-

 Summary: BigQueryGetDataOperator to support reading query from SQL 
file
 Key: AIRFLOW-3254
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3254
 Project: Apache Airflow
  Issue Type: Improvement
Affects Versions: 1.10.0
Reporter: jack


As discussed with [~Fokko] on Slack:

Currently the BigQueryGetDataOperator supports only reading query provided 
directly as:

 
{code:java}
sql = 'SELECT ID FROM TABLE'
{code}
 

it does not support reading the query from a SQL file which can be annoying as 
sometimes queries are quite large.

This behavior is supported by other operators like 
MySqlToGoogleCloudStorageOperator:

dag = DAG(
    dag_id='Import',
    default_args=args,
    schedule_interval='*/5 * * * *',
    max_active_runs=1,
    catchup=False,
    template_searchpath = ['/home/.../airflow/…/sql/Import']
)

 

importop = MySqlToGoogleCloudStorageOperator(
    task_id='import',
    mysql_conn_id='MySQL_con',
    google_cloud_storage_conn_id='gcp_con',
    provide_context=True,
    sql = 'importop.sql',
    params=\{'table_name' : TABLE_NAME},
    bucket=GCS_BUCKET_ID,
    filename=file_name_orders,
    dag=dag)

 

If anyone can pick it up it would be great :)

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-235) Improve connectors interface

2018-10-24 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661974#comment-16661974
 ] 

jack commented on AIRFLOW-235:
--

I think this was already implemented.

The Google Cloud Platform connection has different field than other connectors.

> Improve connectors interface
> 
>
> Key: AIRFLOW-235
> URL: https://issues.apache.org/jira/browse/AIRFLOW-235
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Affects Versions: 1.7.1.2
>Reporter: Jakob Homan
>Assignee: Chi Su
>Priority: Major
>
> Right now the connections interface has the same fields for all connectors, 
> whether or not they apply.  Per-connector values are stuffed into the extra 
> field, which doesn't have any description or clarification.  Connectors don't 
> have any way of displaying what extra information they require.
> It would be better if connectors could define what fields they specified 
> through the interface (a map of field name to type, description, validator, 
> etc).  The connector web page could then render these and pass them back to 
> the connector when it is instantiated. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-376) TypeError("Boolean value of this clause is not defined")

2018-10-24 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661971#comment-16661971
 ] 

jack commented on AIRFLOW-376:
--

[~xuanji] is this still an issue?

> TypeError("Boolean value of this clause is not defined")
> 
>
> Key: AIRFLOW-376
> URL: https://issues.apache.org/jira/browse/AIRFLOW-376
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Li Xuanji
>Assignee: Li Xuanji
>Priority: Minor
>
> With this dag,
> ```
> from airflow import DAG
> from airflow.operators.bash_operator import BashOperator
> from datetime import datetime, timedelta
> default_args = {
> 'owner': 'airflow',
> 'depends_on_past': False,
> 'start_date': datetime(2016, 1, 1, 1, 0),
> 'email': ['xua...@gmail.com'],
> 'email_on_failure': True,
> 'email_on_retry': False,
> 'retries': 3,
> 'retry_delay': timedelta(minutes=1),
> }
> dag = DAG('bash_bash_bash', default_args=default_args)
> # t1, t2 and t3 are examples of tasks created by instatiating operators
> t1 = BashOperator(
> task_id='print_date',
> bash_command='date',
> dag=dag)
> t2 = BashOperator(
> task_id='sleep',
> bash_command='sleep 5',
> retries=3,
> dag=dag)
> templated_command = """
> {% for i in range(5) %}
> echo "{{ ds }}"
> echo "{{ macros.ds_add(ds, 7)}}"
> echo "{{ params.my_param }}"
> {% endfor %}
> """
> t3 = BashOperator(
> task_id='templated',
> bash_command=templated_command,
> params={'my_param': 'Parameter I passed in'},
> dag=dag)
> t2.set_upstream(t1)
> t3.set_upstream(t1)
> ```
> I get an error while running the scheduler
> ```
> [2016-07-27 21:40:57,468] {jobs.py:669} ERROR - Boolean value of this clause 
> is not defined
> Traceback (most recent call last):
>   File "/Users/xuanji_li/tools/zodiac-airflow/airflow/jobs.py", line 667, in 
> _do_dags
> self.manage_slas(dag)
>   File "/Users/xuanji_li/tools/zodiac-airflow/airflow/utils/db.py", line 53, 
> in wrapper
> result = func(*args, **kwargs)
>   File "/Users/xuanji_li/tools/zodiac-airflow/airflow/jobs.py", line 299, in 
> manage_slas
> .filter(SlaMiss.email_sent.is_(False) or 
> SlaMiss.notification_sent.is_(False))
>   File "/Library/Python/2.7/site-packages/sqlalchemy/sql/elements.py", line 
> 2760, in __bool__
> raise TypeError("Boolean value of this clause is not defined")
> TypeError: Boolean value of this clause is not defined
> ```
> Mainly opening this to remind myself to take a look at it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1753) Can't install on windows 10

2018-10-22 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659062#comment-16659062
 ] 

jack commented on AIRFLOW-1753:
---

[~ashb] I think it's worth mentioning on the docs (at least in the FAQ section) 
that Airflow currently can't be installed on Windows. Seen this question on 
many places.

> Can't install on windows 10
> ---
>
> Key: AIRFLOW-1753
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1753
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Lakshman Udayakantha
>Priority: Major
>
> When I installed airflow using "pip install airflow command" two errors pop 
> up.
> 1.  link.exe failed with exit status 1158
> 2.\x86_amd64\\cl.exe' failed with exit status 2
> first issue can be solved by reffering 
> https://stackoverflow.com/questions/43858836/python-installing-clarifai-vs14-0-link-exe-failed-with-exit-status-1158/44563421#44563421.
> But second issue is still there. there was no any solution by googling also. 
> how to prevent that issue and install airflow on windows 10 X64.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2141) Cannot create airflow variables when there is a list of dictionary as a value

2018-10-22 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658904#comment-16658904
 ] 

jack commented on AIRFLOW-2141:
---

This is related to a similar ticket I opened: 
https://issues.apache.org/jira/browse/AIRFLOW-3157

> Cannot create airflow variables when there is a list of dictionary as a value
> -
>
> Key: AIRFLOW-2141
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2141
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws
>Affects Versions: 1.8.0
>Reporter: Soundar
>Priority: Major
>  Labels: beginner, newbie
> Attachments: airflow_cli.png, airflow_cli2_crop.png
>
>
> I'm trying to create Airflow variables using a json file. I am trying to 
> import airflow variables using UI(webserver) when I upload the json file I 
> get this error "Missing file or syntax error" and when I try to upload using 
> airflow cli not all the variables gets uploaded properly. The catch is that I 
> have a list of dictionary in my json file, say
>  ex:
>  {
>  "demo_archivedir": "/home/ubuntu/folders/archive",
>  "demo_filepattern": [
> { "id": "reference", "pattern": "Sample Data.xlsx" }
> ,
> { "id": "sale", "pattern": "Sales.xlsx" }
> ],
>  "demo_sourcepath": "/home/ubuntu/folders/input",
>  "demo_workdir": "/home/ubuntu/folders/working"
>  }
> I've attached two images
> img1. Using airflow variables cli command I was able to create partial 
> variables from my json file(airflow_cli.png)img2. After inserting logs in the 
> "airflow/bin/cli.py" file, I got this error. (airflow_cli2_crop.png)
> The thing is I gave this value through the Admin UI one by one and it worked. 
> Then I exported those same variable using "airflow variables" cli command and 
> tried importing them, still it failed and the above mentioned error still 
> occurs.
> Note:
>    I am using Python 3.5 with Airflow version 1.8
> The stack trace is as follows
> .compute-1.amazonaws.com:22] out: 0 of 4 variables successfully updated.
> .compute-1.amazonaws.com:22] out: Traceback (most recent call last):
> .compute-1.amazonaws.com:22] out:   File "/home/ubuntu/Env/bin/airflow", line 
> 28, in 
> .compute-1.amazonaws.com:22] out: args.func(args)
> .compute-1.amazonaws.com:22] out:   File 
> "/home/ubuntu/Env/lib/python3.5/site-packages/airflow/bin/cli.py", line 242, 
> in variables
> .compute-1.amazonaws.com:22] out: import_helper(imp)
> .compute-1.amazonaws.com:22] out:   File 
> "/home/ubuntu/Env/lib/python3.5/site-packages/airflow/bin/cli.py", line 273, 
> in import_helper
> .compute-1.amazonaws.com:22] out: Variable.set(k, v)
> .compute-1.amazonaws.com:22] out:   File 
> "/home/ubuntu/Env/lib/python3.5/site-packages/airflow/utils/db.py", line 53, 
> in wrapper
> .compute-1.amazonaws.com:22] out: result = func(*args, **kwargs)
> .compute-1.amazonaws.com:22] out:   File 
> "/home/ubuntu/Env/lib/python3.5/site-packages/airflow/models.py", line 3615, 
> in set
> .compute-1.amazonaws.com:22] out: session.add(Variable(key=key, 
> val=stored_value))
> .compute-1.amazonaws.com:22] out:   File "", line 4, in __init__
> .compute-1.amazonaws.com:22] out:   File 
> "/home/ubuntu/Env/lib/python3.5/site-packages/sqlalchemy/orm/state.py", line 
> 417, in _initialize_instance
> .compute-1.amazonaws.com:22] out: manager.dispatch.init_failure(self, 
> args, kwargs)
> .compute-1.amazonaws.com:22] out:   File 
> "/home/ubuntu/Env/lib/python3.5/site-packages/sqlalchemy/util/langhelpers.py",
>  line 66, in __exit__
> .compute-1.amazonaws.com:22] out: compat.reraise(exc_type, exc_value, 
> exc_tb)
> .compute-1.amazonaws.com:22] out:   File 
> "/home/ubuntu/Env/lib/python3.5/site-packages/sqlalchemy/util/compat.py", 
> line 187, in reraise
> .compute-1.amazonaws.com:22] out: raise value
> .compute-1.amazonaws.com:22] out:   File 
> "/home/ubuntu/Env/lib/python3.5/site-packages/sqlalchemy/orm/state.py", line 
> 414, in _initialize_instance
> .compute-1.amazonaws.com:22] out: return 
> manager.original_init(*mixed[1:], **kwargs)
> .compute-1.amazonaws.com:22] out:   File 
> "/home/ubuntu/Env/lib/python3.5/site-packages/sqlalchemy/ext/declarative/base.py",
>  line 700, in _declarative_constructor
> .compute-1.amazonaws.com:22] out: setattr(self, k, kwargs[k])
> compute-1.amazonaws.com:22] out:   File "", line 1, in __set__
> .compute-1.amazonaws.com:22] out:   File 
> "/home/ubuntu/Env/lib/python3.5/site-packages/airflow/models.py", line 3550, 
> in set_val
> .compute-1.amazonaws.com:22] out: self._val = FERNET.encrypt(bytes(value, 
> 'utf-8')).decode()
> .compute-1.amazonaws.com:22] out: TypeError: encoding without a string 
> argument
> .compute-1.amazonaws.com:22] out:



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2323) Should we replace the librabbitmq with other library in setup.py for Apache Airflow 1.9+?

2018-10-22 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658901#comment-16658901
 ] 

jack commented on AIRFLOW-2323:
---

It doesn't seems like the librabbitmq lib is going to fix the problems. It's 
barely maintained.

> Should we replace the librabbitmq with other library in setup.py for Apache 
> Airflow 1.9+?
> -
>
> Key: AIRFLOW-2323
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2323
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: A.Quasimodo
>Priority: Major
>
> As we know, latest librabbitmq is still can't support Python3,so, when I 
> executed the command *pip install apache-airflow[rabbitmq]*, some errors 
> happened.
> So, should we replace the librabbitmq with other libraries like 
> amqplib,py-amqp,.etc?
> Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2925) gcp dataflow hook doesn't show traceback

2018-10-22 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658863#comment-16658863
 ] 

jack commented on AIRFLOW-2925:
---

[~xnuinside] where does the log shows the exception message? "DataFlow failed 
with return code..."

> gcp dataflow hook doesn't show traceback
> 
>
> Key: AIRFLOW-2925
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2925
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: jack
>Priority: Major
>  Labels: easyfix
>
> The gcp_dataflow_hook.py has:
>  
> {code:java}
> if self._proc.returncode is not 0:   
> raise Exception("DataFlow failed with return code 
> {}".format(self._proc.returncode))
> {code}
>  
> This does not show the full trace of the error which makes it harder to 
> understand the problem.
> [https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/gcp_dataflow_hook.py#L171]
>  
>  
> reported on gitter by Oscar Carlsson



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2722) ECSOperator requires network configuration parameter when FARGATE launch type is used

2018-10-22 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658857#comment-16658857
 ] 

jack commented on AIRFLOW-2722:
---

[~ThomasVdBerge] this refers to your PR

> ECSOperator requires network configuration parameter when FARGATE launch type 
> is used
> -
>
> Key: AIRFLOW-2722
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2722
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws
>Affects Versions: 2.0.0
>Reporter: Craig Forster
>Priority: Major
>
> The 'FARGATE' launch type was added in AIRFLOW-2435, however when using that 
> launch mode the following error is returned:
> {noformat}
> Network Configuration must be provided when networkMode 'awsvpc' is specified.
> {noformat}
> Fargate-launched tasks use the "awsvpc" networking type, and as per the 
> [boto3 
> documentation|http://boto3.readthedocs.io/en/latest/reference/services/ecs.html#ECS.Client.run_task]
>  for run_task:
> {quote}This parameter is required for task definitions that use the awsvpc 
> network mode to receive their own Elastic Network Interface, and it is not 
> supported for other network modes.
> {quote}
> As it's currently implemented, the Fargate launch type is unusable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >