[jira] [Resolved] (AIRFLOW-2994) flatten_results in BigQueryOperator/BigQueryHook should default to None

2018-08-31 Thread Chris Riccomini (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-2994.
--
Resolution: Fixed

> flatten_results in BigQueryOperator/BigQueryHook should default to None
> ---
>
> Key: AIRFLOW-2994
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2994
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: gcp
>Affects Versions: 1.10.0
>Reporter: Chris Riccomini
>Priority: Major
> Fix For: 1.10.1
>
>
> Upon upgrading to 1.10, we began seeing issues with our queries that were 
> using allow_large_results. They began failing because flatten_results now 
> defaults to False. This should default to unset (None), as it did before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2994) flatten_results in BigQueryOperator/BigQueryHook should default to None

2018-08-31 Thread Chris Riccomini (JIRA)
Chris Riccomini created AIRFLOW-2994:


 Summary: flatten_results in BigQueryOperator/BigQueryHook should 
default to None
 Key: AIRFLOW-2994
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2994
 Project: Apache Airflow
  Issue Type: Bug
  Components: gcp
Affects Versions: 1.10.0
Reporter: Chris Riccomini
 Fix For: 1.10.1


Upon upgrading to 1.10, we began seeing issues with our queries that were using 
allow_large_results. They began failing because flatten_results now defaults to 
False. This should default to unset (None), as it did before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2326) Duplicate GCS copy operator

2018-04-16 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439636#comment-16439636
 ] 

Chris Riccomini commented on AIRFLOW-2326:
--

[~b11c], I suspect this was an oversight. I am fine with removing one of them. 
If we haven't released them (i.e. they were committed after 1.9), I suggest 
just deleting the one you think should be removed.

> Duplicate GCS copy operator
> ---
>
> Key: AIRFLOW-2326
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2326
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Berislav Lopac
>Priority: Minor
>
> I apologise if this is a known thing, but I have been wondering if anyone can 
> give a rationale why do we have two separate operators that perform Google 
> Cloud Storage objects copy -- specifically, 
> {{gcs_copy_operator.GoogleCloudStorageCopyOperator}} and 
> {{gcs_to_gcs.GoogleCloudStorageToGoogleCloudStorageOperator}}. As far as I 
> can tell they have nearly the same functionality, with the latter being a bit 
> more flexible (with the {{move_object}} flag).
> If both are not needed, I would like to propose removing one of them 
> (specifically, the {{gcs_copy_operator}} one); if necessary it can be made 
> into a wrapper/subclass of the other one, marked for deprecation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2258) Allow for import of Parquet-format files into BigQuery

2018-03-27 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-2258.
--
   Resolution: Fixed
Fix Version/s: (was: Airflow 2.0)
   1.10.0

> Allow for import of Parquet-format files into BigQuery
> --
>
> Key: AIRFLOW-2258
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2258
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: Airflow 2.0
>Reporter: Steve Conover
>Assignee: Steve Conover
>Priority: Minor
> Fix For: 1.10.0
>
>
> Update the "allowed_formats" in bigquery_operator.py to allow files of format 
> PARQUET.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2166) BigQueryBaseCursor missing sql dialect parameter

2018-03-02 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-2166.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

> BigQueryBaseCursor missing sql dialect parameter
> 
>
> Key: AIRFLOW-2166
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2166
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Winston Huang
>Assignee: Winston Huang
>Priority: Major
> Fix For: 1.10.0
>
>
> [https://github.com/apache/incubator-airflow/pull/2964] introduced a 
> backward-incompatible change to {{BigQueryBaseCursor}} by removing the 
> {{use_legacy_sql}} parameter from the {{run_query}} method. This parameter 
> should be restored and override the default cursor dialect when specified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2029) Fix AttributeError in BigQueryPandasConnector

2018-01-25 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-2029.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

> Fix AttributeError in BigQueryPandasConnector
> -
>
> Key: AIRFLOW-2029
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2029
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Affects Versions: 1.10.0, 1.9.1
>Reporter: Feng Lu
>Assignee: Feng Lu
>Priority: Minor
> Fix For: 1.10.0
>
>
> When BigQueryPandasConnector (in bigquery_hook.py) encounters a BQ job 
> insertion error, the exception will be assigned to connector.http_error, 
> which is defined in parent connector GbqConnector but uninitialized. Hence 
> the following AttributeError exception is thrown: 
> [2018-01-23 01:03:36,873] \{base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py",
>  line 94, in get_pandas_df [2018-01-23 01:03:36,874] 
> \{base_task_runner.py:98} INFO - Subtask: schema, pages = 
> connector.run_query(bql) [2018-01-23 01:03:36,879] \{base_task_runner.py:98} 
> INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/pandas_gbq/gbq.py", line 503, in 
> run_query [2018-01-23 01:03:36,881] \{base_task_runner.py:98} INFO - Subtask: 
> except self.http_error as ex: [2018-01-23 01:03:36,888] 
> \{base_task_runner.py:98} INFO - Subtask: AttributeError: 
> 'BigQueryPandasConnector' object has no attribute 'http_error'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2016) Add support for Dataproc Workflow Templates

2018-01-24 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-2016.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

> Add support for Dataproc Workflow Templates
> ---
>
> Key: AIRFLOW-2016
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2016
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Reporter: Dan Sedov
>Assignee: Dan Sedov
>Priority: Minor
> Fix For: 1.10.0
>
>
> Add new operators to support instantiation of Google Cloud Dataproc Workflow 
> Templates.
> See: 
> https://cloud.google.com/dataproc/docs/reference/rest/v1beta2/projects.regions.workflowTemplates



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2025) GoogleCloudStorageDownloadOperator print file contents to log files

2018-01-23 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-2025.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

> GoogleCloudStorageDownloadOperator print file contents to log files
> ---
>
> Key: AIRFLOW-2025
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2025
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp
>Affects Versions: 1.9.0
>Reporter: Badger
>Assignee: Badger
>Priority: Minor
>  Labels: patch, pull-request-available
> Fix For: 1.10.0
>
>
> The logging for this Operator prints the file contents of the file to the log 
> file. This could potentially lead to large log files if a large file was to 
> be downloaded. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2025) GoogleCloudStorageDownloadOperator print file contents to log files

2018-01-23 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini reassigned AIRFLOW-2025:


Assignee: Badger

> GoogleCloudStorageDownloadOperator print file contents to log files
> ---
>
> Key: AIRFLOW-2025
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2025
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp
>Affects Versions: 1.9.0
>Reporter: Badger
>Assignee: Badger
>Priority: Minor
>  Labels: patch, pull-request-available
> Fix For: 1.10.0
>
>
> The logging for this Operator prints the file contents of the file to the log 
> file. This could potentially lead to large log files if a large file was to 
> be downloaded. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2025) GoogleCloudStorageDownloadOperator print file contents to log files

2018-01-23 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336035#comment-16336035
 ] 

Chris Riccomini commented on AIRFLOW-2025:
--

This is also a potential security issue, as the downloaded file might contain 
sensitive information.

> GoogleCloudStorageDownloadOperator print file contents to log files
> ---
>
> Key: AIRFLOW-2025
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2025
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp
>Affects Versions: 1.9.0
>Reporter: Badger
>Priority: Minor
>  Labels: patch
>
> The logging for this Operator prints the file contents of the file to the log 
> file. This could potentially lead to large log files if a large file was to 
> be downloaded. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1431) Cannot create connection for GCP using CLI

2018-01-22 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1431.
--
Resolution: Duplicate

> Cannot create connection for GCP using CLI
> --
>
> Key: AIRFLOW-1431
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1431
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Affects Versions: Airflow 1.8
> Environment: Python 3
>Reporter: peay
>Priority: Minor
>
> {{airflow connections --add}} only takes an URI argument, and deduces the 
> connection type from the scheme, and other fields from hostname, etc.
> The connection type for the GCP connection is {{google_cloud_platform}}.
> This is not a valid scheme according to {{urllib.parse.urlparse}}:
> {code}
> >>> from urllib.parse import urlparse
> >>> urlparse("google_cloud_platform://hostname")
> ParseResult(scheme='', netloc='', path='google_cloud_platform://hostname', 
> params='', query='', fragment='')
> >>> urlparse("platform://hostname")
> ParseResult(scheme='platform', netloc='hostname', path='', params='', 
> query='', fragment='')
> {code}
> See https://tools.ietf.org/html/rfc3986.html#section-3.1 which specifies 
> {{scheme  = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )}}
>  As a consequence, it is not currently possible to add GCP connections using 
> the CLI.
> Changing the connection name to {{gcp}} would solve it properly but may 
> require lots of small modifications accross the codebase. Alternatively, 
> {code}
> if scheme == "gcp":
> schema = "google_cloud_platform"
> {code}
> right after after parsing should be a simple self-contained fix. There is 
> already a similar fix in there for {{postgres -> postgresql}}. On the 
> downside, this introduces a special case that would need to be documented.
> A last option would be to add an argument to override the scheme from the 
> URI. This would be backward compatible.
> I'll be happy to contribute a PR if we can agree on a plan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2021) Can't register a GCP connection ID on the web UI

2018-01-22 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334879#comment-16334879
 ] 

Chris Riccomini commented on AIRFLOW-2021:
--

We have seen an issue similar to this. It was caused by having a bad version of 
Flask installed. Restarting from a clean virtual environment fixed the problem.

> Can't register a GCP connection ID on the web UI
> 
>
> Key: AIRFLOW-2021
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2021
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 1.9.0
>Reporter: Yu Ishikawa
>Priority: Major
> Attachments: airflow_connections_error.png
>
>
> I was not able to register a GCP connection ID on the web UI. The below error 
> messages appeared. I attached a capture image of the web UI.
>  
> {noformat}
> Failed to update record. on_model_change() takes exactly 4 arguments (3 
> given){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2000) Support non-main dataflow job class

2018-01-16 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-2000.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

> Support non-main dataflow job class
> ---
>
> Key: AIRFLOW-2000
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2000
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib
>Affects Versions: 1.10.0
>Reporter: Feng Lu
>Assignee: Feng Lu
>Priority: Minor
> Fix For: 1.10.0
>
>
> Allow one to launch a non-main runnable dataflow job class in a jar. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1997) Documentation issues for GCP operators

2018-01-12 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1997.
--
   Resolution: Fixed
Fix Version/s: (was: Airflow 2.0)
   1.10.0

> Documentation issues for GCP operators
> --
>
> Key: AIRFLOW-1997
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1997
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp
>Affects Versions: Airflow 2.0
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Trivial
>  Labels: documentation
> Fix For: 1.10.0
>
>
> The documentation for `GoogleCloudStorageToBigQueryOperator ` and `DataProc` 
> operators doesn't show the parameters correctly, they're written as if they 
> were the standard explanation text. 
> This results in the documentation will not be automatically rendered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1996) Update DataflowHook waitfordone for Streaming type job

2018-01-12 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1996:
-
Fix Version/s: (was: 2.0.0)
   1.10.0

> Update DataflowHook waitfordone for Streaming type job
> --
>
> Key: AIRFLOW-1996
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1996
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib, Dataflow, gcp
>Reporter: Ivan Wirawan
>Assignee: Ivan Wirawan
> Fix For: 1.10.0
>
>
> When I ran a Dataflow Job with Streaming Job Type, the airflow task will not 
> finished cause the job will not go to state Done and always Running.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1688) Automatically add load.time_partitioning to bigquery_hook when table name includes $

2018-01-11 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1688.
--
   Resolution: Fixed
Fix Version/s: (was: Airflow 2.0)
   1.10.0

> Automatically add load.time_partitioning to bigquery_hook when table name 
> includes $
> 
>
> Key: AIRFLOW-1688
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1688
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp, hooks
>Affects Versions: Airflow 1.8
>Reporter: Alberto Calderari
>Assignee: Alberto Calderari
>Priority: Minor
> Fix For: 1.10.0
>
>
> The gcs_to_bq operator throws and exception when trying to auto-create a new 
> date partitioned table. 
> To allow the table creation the load configuration needs the api option:
>  load.timePartitioning: {type: 'DAY'}
> I will add a fix to identify date partitioned table from the presence of a $ 
> in the table name and add the option.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1958) Add **kwargs to send_email

2018-01-10 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1958.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

> Add **kwargs to send_email
> --
>
> Key: AIRFLOW-1958
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1958
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Marcin Szymanski
>Assignee: Marcin Szymanski
>Priority: Minor
> Fix For: 1.10.0
>
>
> Additional parameters will can be used with backends other than SMTP. This 
> also gives greater flexibility when using the same logic just different 
> backends across environment, for example SMTP in prod, dummy in dev and test



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1969) Google OAuth2 redirect URL generated as http when proxied

2018-01-05 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1969.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

> Google OAuth2 redirect URL generated as http when proxied
> -
>
> Key: AIRFLOW-1969
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1969
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication
>Affects Versions: Airflow 1.8
>Reporter: Eleanor Berger
>Assignee: Eleanor Berger
> Fix For: 1.10.0
>
>
> The Google Oauth2 authentication plugin, generates URLs with the http scheme 
> when requests to Airflow are made using http. This is common in cases where 
> the Airflow web app is served behind a proxy like an AWS load balancer or 
> Nginx.
> The agreed solution is to force the scheme to always be https.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1954) Add a new DataFlowTemplateOperator which runs Dataflow pipeline based on a template.

2018-01-04 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1954.
--
   Resolution: Fixed
Fix Version/s: (was: Airflow 2.0)
   1.10.0

> Add a new DataFlowTemplateOperator which runs Dataflow pipeline based on a 
> template.
> 
>
> Key: AIRFLOW-1954
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1954
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib, Dataflow, gcp
>Reporter: David Sabater
>Assignee: David Sabater
>  Labels: features
> Fix For: 1.10.0
>
>
> This operator should start a Dataflow pipeline based on a template.
> https://cloud.google.com/dataflow/docs/templates/overview



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1946) Get a BigQuery Table data in a python array / list

2018-01-03 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1946.
--
Resolution: Fixed

> Get a BigQuery Table data in a python array / list
> --
>
> Key: AIRFLOW-1946
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1946
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib, gcp
>Affects Versions: Airflow 2.0
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Minor
> Fix For: 1.10.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> There are hooks already available to get the data from a BigQuery table to a 
> Python List. It would be good to have an Operator to get this which can then 
> be used by xcom.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1946) Get a BigQuery Table data in a python array / list

2018-01-03 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1946:
-
Fix Version/s: (was: Airflow 2.0)
   1.10.0

> Get a BigQuery Table data in a python array / list
> --
>
> Key: AIRFLOW-1946
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1946
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib, gcp
>Affects Versions: Airflow 2.0
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Minor
> Fix For: 1.10.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> There are hooks already available to get the data from a BigQuery table to a 
> Python List. It would be good to have an Operator to get this which can then 
> be used by xcom.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (AIRFLOW-1953) Add labels to dataflow jobs

2018-01-03 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini closed AIRFLOW-1953.

   Resolution: Fixed
Fix Version/s: 1.10.0

> Add labels to dataflow jobs
> ---
>
> Key: AIRFLOW-1953
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1953
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib
>Affects Versions: 1.10.0
>Reporter: Feng Lu
>Assignee: Feng Lu
>Priority: Minor
> Fix For: 1.10.0
>
>
> Extend Dataflow{Java,Python}Operator to allow user to label dataflow jobs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1938) Remove tag version check in setup.py

2017-12-19 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1938.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

> Remove tag version check in setup.py
> 
>
> Key: AIRFLOW-1938
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1938
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Chris Riccomini
>Assignee: Chris Riccomini
> Fix For: 1.10.0
>
>
> This version check in setup.py is causing problems with releases:
> {noformat}
> tag = repo.git.describe(
> match='[0-9]*', exact_match=True,
> tags=True, dirty=True)
> assert tag == version, (tag, version)
> {noformat}
> The issue is that we need to tag a release as an RC, but we need to have the 
> version of the RC be the final version. For example, tag=1.9.0rc1 and 
> version=1.9.0. This assertion fails when we make an artifact like this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (AIRFLOW-1938) Remove tag version check in setup.py

2017-12-19 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini reassigned AIRFLOW-1938:


Assignee: Chris Riccomini

> Remove tag version check in setup.py
> 
>
> Key: AIRFLOW-1938
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1938
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Chris Riccomini
>Assignee: Chris Riccomini
> Fix For: 1.10.0
>
>
> This version check in setup.py is causing problems with releases:
> {noformat}
> tag = repo.git.describe(
> match='[0-9]*', exact_match=True,
> tags=True, dirty=True)
> assert tag == version, (tag, version)
> {noformat}
> The issue is that we need to tag a release as an RC, but we need to have the 
> version of the RC be the final version. For example, tag=1.9.0rc1 and 
> version=1.9.0. This assertion fails when we make an artifact like this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1913) Add Delete Operator for GCP Pub/Sub Topics and Create/Delete Operator for Subscriptions

2017-12-18 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1913.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

> Add Delete Operator for GCP Pub/Sub Topics and Create/Delete Operator for 
> Subscriptions
> ---
>
> Key: AIRFLOW-1913
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1913
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib, gcp, hooks
>Affects Versions: Airflow 1.8
>Reporter: Jason Prodonovich
>Assignee: Jason Prodonovich
> Fix For: 1.10.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> There have been requests for Operators that pull messages from Google Cloud 
> Pub/Sub and then act upon the receipt of those messages. In order to 
> facilitate that, additional PubSub Operators must be added to handle creation 
> and deletion of both topics and subscriptions. This will allow for Workflows 
> and end-to-end tests to perform all of the necessary setup and cleanup when 
> wishing to subscribe and process messages from a topic.
> A subsequent issue will be created for the Pull and Acknowledge Operators.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1938) Remove tag version check in setup.py

2017-12-18 Thread Chris Riccomini (JIRA)
Chris Riccomini created AIRFLOW-1938:


 Summary: Remove tag version check in setup.py
 Key: AIRFLOW-1938
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1938
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Chris Riccomini


This version check in setup.py is causing problems with releases:

{noformat}
tag = repo.git.describe(
match='[0-9]*', exact_match=True,
tags=True, dirty=True)
assert tag == version, (tag, version)
{noformat}

The issue is that we need to tag a release as an RC, but we need to have the 
version of the RC be the final version. For example, tag=1.9.0rc1 and 
version=1.9.0. This assertion fails when we make an artifact like this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1525) Fix minor LICENSE & NOTICE issue

2017-12-15 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1525.
--
   Resolution: Fixed
 Assignee: Chris Riccomini
Fix Version/s: (was: 1.9.0)
   1.10.0

> Fix minor LICENSE & NOTICE issue
> 
>
> Key: AIRFLOW-1525
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1525
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Maxime Beauchemin
>Assignee: Chris Riccomini
> Fix For: 1.10.0
>
>
> Per Justin Mclean on the 1.8.2 [VOTE] thread:
> - year in NOTICE is incorrect
> - LICENSE is missing several things (see below)
> LICENSE is missing this BSD licensed file [1], this MIT licensed  file [2] 
> (and the MIT header) and license (and header) for normalize.css in [3].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1525) Fix minor LICENSE & NOTICE issue

2017-12-15 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16293262#comment-16293262
 ] 

Chris Riccomini commented on AIRFLOW-1525:
--

Correction: year in NOTICE says 2016 onward. Should be fine.

> Fix minor LICENSE & NOTICE issue
> 
>
> Key: AIRFLOW-1525
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1525
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Maxime Beauchemin
> Fix For: 1.9.0
>
>
> Per Justin Mclean on the 1.8.2 [VOTE] thread:
> - year in NOTICE is incorrect
> - LICENSE is missing several things (see below)
> LICENSE is missing this BSD licensed file [1], this MIT licensed  file [2] 
> (and the MIT header) and license (and header) for normalize.css in [3].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1892) Added capability in BigQuery hook to extract data for selected columns

2017-12-11 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1892.
--
Resolution: Fixed

> Added capability in BigQuery hook to extract data for selected columns
> --
>
> Key: AIRFLOW-1892
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1892
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib, gcp
>Affects Versions: 1.10.0
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Minor
> Fix For: 1.10.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The `get_tabledata` method in `bigquery_hook` currently does not support 
> extracting data for a specific field, instead, it extracts the full table.
> The proposal is to add the parameter to extract data for specific columns 
> from a BigQuery table. The idea after that is to create an operator to get 
> data in a python list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1829) Support BigQuery schema updates as a side effect of a query job

2017-12-11 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1829.
--
   Resolution: Fixed
 Assignee: Guillermo Rodríguez Cano
Fix Version/s: 1.10.0

> Support BigQuery schema updates as a side effect of a query job
> ---
>
> Key: AIRFLOW-1829
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1829
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib, gcp, hooks, operators
>Reporter: Guillermo Rodríguez Cano
>Assignee: Guillermo Rodríguez Cano
>Priority: Critical
>  Labels: easyfix, triaged
> Fix For: 1.10.0
>
>
> BigQuery hook supports schema updates as a side effect of a load job but not 
> for query jobs. Respectively GCS to BQ operator (which executes a load job) 
> supports such possibility unlike its 'sister' operator, BQ operator, when 
> running a query with a table as destination.
> Both operations, load and query, should support such feature (experimental as 
> of this writing though)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-15) Remove GCloud from Airflow

2017-12-07 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16282154#comment-16282154
 ] 

Chris Riccomini commented on AIRFLOW-15:


[~fenglu], that sounds good to me. This JIRA is actually about removing the 
gcloud library, though. [~fenglu], can you open a new JIRA that proposes what 
you're saying, and assign to yourself for process tracking?

Also, we need to make sure that the compatibility issues are worked out. Last 
time we tried running both in parallel, we had major dependency conflict issues 
(see third bullet point in the description of this JIRA).

> Remove GCloud from Airflow
> --
>
> Key: AIRFLOW-15
> URL: https://issues.apache.org/jira/browse/AIRFLOW-15
> Project: Apache Airflow
>  Issue Type: Task
>  Components: gcp
>Reporter: Chris Riccomini
>Assignee: Chris Riccomini
>  Labels: gcp
>
> After speaking with Google, there was some concern about using the 
> [gcloud-python|https://github.com/GoogleCloudPlatform/gcloud-python] library 
> for Airflow. There are several concerns:
> # It's not clear (even to people at Google) what this library is, who owns 
> it, etc.
> # It does not support all services (the way 
> [google-api-python-client|https://github.com/google/google-api-python-client] 
> does).
> # There are compatibility issues between google-api-python-client and 
> gcloudpython.
> We currently support both, after libraries depending on which package you you 
> install: {{airfow[gcp_api]}} or {{airflow[gcloud]}}. This ticket is to remove 
> the {{airflow[gcloud]}} packaged, and all associated code.
> The main associated code, afaik, is the use of the {{gcloud}} library in the 
> Google cloud storage hooks/operators--specifically for Google cloud storage 
> Airfow logging.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-15) Remove GCloud from Airflow

2017-12-06 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280649#comment-16280649
 ] 

Chris Riccomini commented on AIRFLOW-15:


[~fenglu], perhaps you can poke someone on your end to get guidance on the 
right library to use? What are your thoughts?

> Remove GCloud from Airflow
> --
>
> Key: AIRFLOW-15
> URL: https://issues.apache.org/jira/browse/AIRFLOW-15
> Project: Apache Airflow
>  Issue Type: Task
>  Components: gcp
>Reporter: Chris Riccomini
>Assignee: Chris Riccomini
>  Labels: gcp
>
> After speaking with Google, there was some concern about using the 
> [gcloud-python|https://github.com/GoogleCloudPlatform/gcloud-python] library 
> for Airflow. There are several concerns:
> # It's not clear (even to people at Google) what this library is, who owns 
> it, etc.
> # It does not support all services (the way 
> [google-api-python-client|https://github.com/google/google-api-python-client] 
> does).
> # There are compatibility issues between google-api-python-client and 
> gcloudpython.
> We currently support both, after libraries depending on which package you you 
> install: {{airfow[gcp_api]}} or {{airflow[gcloud]}}. This ticket is to remove 
> the {{airflow[gcloud]}} packaged, and all associated code.
> The main associated code, afaik, is the use of the {{gcloud}} library in the 
> Google cloud storage hooks/operators--specifically for Google cloud storage 
> Airfow logging.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-15) Remove GCloud from Airflow

2017-12-06 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280614#comment-16280614
 ] 

Chris Riccomini commented on AIRFLOW-15:


[~barrywhart], to be honest, I am not sure what the right path forward is. I 
was told specifically by several Google PMs at Google Next last year not to use 
the idiomatic library. Since then, the message you are pointing to has appeared 
on the google python client.

The use case is further muddied by the fact that I'm not convinced an idiomatic 
Python client is actually what we want. The fact that the Google APIs all work 
the same way in the service binding API makes it pretty easy to abstract over a 
lot of the mechanics around interacting with Google in a generic way that can 
be leveraged by all GCP operators. I haven't looked into whether or not this 
overhead would increase if we went to an idiomatic library where interacting 
with GCS might look very different from interacting with Dataflow, etc.

> Remove GCloud from Airflow
> --
>
> Key: AIRFLOW-15
> URL: https://issues.apache.org/jira/browse/AIRFLOW-15
> Project: Apache Airflow
>  Issue Type: Task
>  Components: gcp
>Reporter: Chris Riccomini
>Assignee: Chris Riccomini
>  Labels: gcp
>
> After speaking with Google, there was some concern about using the 
> [gcloud-python|https://github.com/GoogleCloudPlatform/gcloud-python] library 
> for Airflow. There are several concerns:
> # It's not clear (even to people at Google) what this library is, who owns 
> it, etc.
> # It does not support all services (the way 
> [google-api-python-client|https://github.com/google/google-api-python-client] 
> does).
> # There are compatibility issues between google-api-python-client and 
> gcloudpython.
> We currently support both, after libraries depending on which package you you 
> install: {{airfow[gcp_api]}} or {{airflow[gcloud]}}. This ticket is to remove 
> the {{airflow[gcloud]}} packaged, and all associated code.
> The main associated code, afaik, is the use of the {{gcloud}} library in the 
> Google cloud storage hooks/operators--specifically for Google cloud storage 
> Airfow logging.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1869) Logging in gcs_task_handler discards too many error messages

2017-12-05 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1869.
--
   Resolution: Fixed
Fix Version/s: (was: Airflow 2.0)
   1.10.0

> Logging in gcs_task_handler discards too many error messages
> 
>
> Key: AIRFLOW-1869
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1869
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Reporter: William Pursell
>Assignee: William Pursell
>Priority: Minor
> Fix For: 1.10.0
>
>
> Many exceptions are caught and effectively discarded in the gcs task log 
> reader.  It makes debugging difficult.  The logs should be more verbose and 
> include the exception strings.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1876) Subtask logs are not easily distinguised

2017-12-05 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1876.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

> Subtask logs are not easily distinguised
> 
>
> Key: AIRFLOW-1876
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1876
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Reporter: William Pursell
>Assignee: William Pursell
>Priority: Minor
> Fix For: 1.10.0
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> Currently, when the scheduler is outputting all subtask logs to the same 
> stream, it is not easy to distinguish which logs come from which task.  It 
> would be nice if there were some convenient way to filter the logs from a 
> given task.  For example, putting the task id after the word 'Subtask'
> For example:
> diff --git a/airflow/task_runner/base_task_runner.py 
> b/airflow/task_runner/base_task_runner.py 
> index bc0edcf3..e40f6ea9 100644
> --- a/airflow/task_runner/base_task_runner.py  
> +++ b/airflow/task_runner/base_task_runner.py  
> @@ -95,7 +95,11 @@ class BaseTaskRunner(LoggingMixin): 
>  line = line.decode('utf-8')   
>  if len(line) == 0:
>  break 
> -self.log.info(u'Subtask %s: %s', self._task_instance, 
> line.rstrip('\n'))  
> +self.log.info(
> +u'Subtask %d: %s',
> +self._task_instance.job_id,   
> +line.rstrip('\n') 
> +) 
>
>  def run_command(self, run_with, join_args=False): 
>  """ 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1883) Get File Size for objects in Google Cloud Storage

2017-12-04 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1883.
--
Resolution: Fixed

> Get File Size for objects in Google Cloud Storage
> -
>
> Key: AIRFLOW-1883
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1883
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, gcp
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
> Fix For: 1.10.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I would want to get file size for objects in Google Cloud Storage. The use 
> case is when the files are uploaded to GCS bucket I would want to validate 
> whether the file is uploaded correctly by matching the file size.
> Proposed Approach:
> - Added a get_file_size() hook in gcs hook.
> - Create an operator that allows getting the file size for a file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1883) Get File Size for objects in Google Cloud Storage

2017-12-04 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1883:
-
Fix Version/s: 1.10.0

> Get File Size for objects in Google Cloud Storage
> -
>
> Key: AIRFLOW-1883
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1883
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, gcp
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
> Fix For: 1.10.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I would want to get file size for objects in Google Cloud Storage. The use 
> case is when the files are uploaded to GCS bucket I would want to validate 
> whether the file is uploaded correctly by matching the file size.
> Proposed Approach:
> - Added a get_file_size() hook in gcs hook.
> - Create an operator that allows getting the file size for a file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1866) Fix missing parameters in docstring for copy function in gcs_hook

2017-12-01 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1866.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

> Fix missing parameters in docstring for copy function in gcs_hook
> -
>
> Key: AIRFLOW-1866
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1866
> Project: Apache Airflow
>  Issue Type: Task
>  Components: contrib, gcp
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Minor
> Fix For: 1.10.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The docstrings for copy method in gcs_hook.py is outdated. PyCharm warns that 
> (source_bucket, source_object) have missing docstrings.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1855) Add an Operator to copy files with a specific delimiter in a directory from one GCS bucket to another

2017-12-01 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1855.
--
Resolution: Fixed

> Add an Operator to copy files with a specific delimiter in a directory from 
> one GCS bucket to another
> -
>
> Key: AIRFLOW-1855
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1855
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, gcp
>Affects Versions: 1.10.0
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Minor
> Fix For: 1.10.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Use case: Copy all the CSV/JSON files from a particular directory in a Bucket 
> to another bucket and in a specific directory (or the same).
> Proposed Approach:
> - Add 'delimiter' argument in GCP hook to filter files with a particular 
> delimiter.
> - Get the list of files to copy and filter it with delimiter using 'list' 
> method in GCP hook
> - Use loop and 'copy' method in GCP hook. 
> Note: Under the hood GCS has no directories. Files are just objects.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1873) Task operator logs appear in wrong numbered log file

2017-11-30 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1873:
-
Fix Version/s: 1.9.0

> Task operator logs appear in wrong numbered log file
> 
>
> Key: AIRFLOW-1873
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1873
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Ash Berlin-Taylor
> Fix For: 1.9.0
>
>
> The logs for the running operators appear in the "next" task number.
> For example, for the first try for a given task instance the "collecting dag" 
> etc appear in 1.log, but log messages from the operator itself appear in 
> 2.log.
> 1.log:
> {noformat}
> [2017-11-30 23:14:44,189] {cli.py:374} INFO - Running on host 4f1698e8ae61
> [2017-11-30 23:14:44,254] {models.py:1173} INFO - Dependencies all met for 
> 
> [2017-11-30 23:14:44,265] {models.py:1173} INFO - Dependencies all met for 
> 
> [2017-11-30 23:14:44,266] {models.py:1383} INFO -
> 
> Starting attempt 1 of 1
> 
> [2017-11-30 23:14:44,290] {models.py:1404} INFO - Executing 
>  on 2017-11-20 00:00:00
> [2017-11-30 23:14:44,291] {base_task_runner.py:115} INFO - Running: ['bash', 
> '-c', 'airflow run tests test-logging 2017-11-20T00:00:00 --job_id 4 --raw 
> -sd /usr/local/airflow/dags/example/csv_to_parquet.py']
> [2017-11-30 23:14:50,054] {base_task_runner.py:98} INFO - Subtask: 
> [2017-11-30 23:14:50,052] {configuration.py:206} WARNING - section/key 
> [celery/celery_ssl_active] not found in config
> [2017-11-30 23:14:50,056] {base_task_runner.py:98} INFO - Subtask: 
> [2017-11-30 23:14:50,052] {default_celery.py:41} WARNING - Celery Executor 
> will run without SSL
> [2017-11-30 23:14:50,058] {base_task_runner.py:98} INFO - Subtask: 
> [2017-11-30 23:14:50,054] {__init__.py:45} INFO - Using executor 
> CeleryExecutor
> [2017-11-30 23:14:50,529] {base_task_runner.py:98} INFO - Subtask: 
> [2017-11-30 23:14:50,529] {models.py:189} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-11-30 23:14:50,830] {base_task_runner.py:98} INFO - Subtask: 
> [2017-11-30 23:14:50,825] {python_operator.py:90} INFO - Done. Returned value 
> was: None
> {noformat}
> 2.log:
> {noformat}
> [2017-11-30 23:14:50,749] {cli.py:374} INFO - Running on host 4f1698e8ae61
> [2017-11-30 23:14:50,820] {logging_mixin.py:84} INFO - Hi from 
> /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-11-30 23:14:50,824] {csv_to_parquet.py:21} ERROR - Hello
> {noformat}
> Notice the timestamps - the contents of 2.log appear just before the last 
> line of 1.log, and should be in the same log file (there is only a single run 
> of this task instance)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1843) Add Google Cloud Storage Sensor with prefix

2017-11-27 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1843.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

> Add Google Cloud Storage Sensor with prefix
> ---
>
> Key: AIRFLOW-1843
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1843
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib, gcp
>Affects Versions: 1.8.1, 1.9.1
>Reporter: Igors Vaitkus
>Assignee: Igors Vaitkus
>Priority: Minor
> Fix For: 1.10.0
>
>
> Hook can do list objects in bucket with prefix so I need sensor which will 
> check bucket with prefix if there any incoming files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-868) Add PostgresToGCSOperator

2017-11-27 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-868.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

> Add PostgresToGCSOperator
> -
>
> Key: AIRFLOW-868
> URL: https://issues.apache.org/jira/browse/AIRFLOW-868
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Adam Boscarino
>Assignee: Adam Boscarino
>Priority: Trivial
> Fix For: 1.10.0
>
>
> As a user, I would like the ability to extract data from a Postgres database 
> to Google Cloud Storage in a manner similar to the existing MySQL 
> implementation. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1613) Make MySqlToGoogleCloudStorageOperator compaitible with python3

2017-11-27 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1613.
--
   Resolution: Fixed
Fix Version/s: (was: 1.9.0)
   1.10.0

> Make MySqlToGoogleCloudStorageOperator compaitible with python3
> ---
>
> Key: AIRFLOW-1613
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1613
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Reporter: Joy Gao
>Assignee: Joy Gao
> Fix For: 1.10.0
>
>
> 1. 
> In Python 3, map(...) returns an iterator, which can only be iterated over 
> once. 
> Therefore the current implementation will return an empty list after the 
> first iteration of schema:
> {code}
> schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
> file_no = 0
> tmp_file_handle = NamedTemporaryFile(delete=True)
> tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}
> for row in cursor:
> # Convert datetime objects to utc seconds, and decimals to floats
> row = map(self.convert_types, row)
> row_dict = dict(zip(schema, row))
> {code}
> 2.
> File opened as binary, but string are written to it. Get error `a bytes-like 
> object is required, not 'str'`. Use mode='w' instead.
> 3.
> Operator currently does not support binary columns in mysql.  We should 
> support uploading binary columns from mysql to cloud storage as it's a pretty 
> common use-case. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1795) S3Hook no longer accepts s3_conn_id breaking build in ops/sensors and back-compat

2017-11-16 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255601#comment-16255601
 ] 

Chris Riccomini commented on AIRFLOW-1795:
--

That said, all of these seem like fine-enough short term solutions to me.

> S3Hook no longer accepts s3_conn_id breaking build in ops/sensors and 
> back-compat 
> --
>
> Key: AIRFLOW-1795
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1795
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Ash Berlin-Taylor
> Fix For: 1.9.0
>
>
> Found whilst testing Airflow 1.9.0rc1
> Previously the S3Hook accepted a parameter of {{s3_conn_id}}. As part of 
> AIRFLOW-1520 we moved S3Hook to have a superclass of AWSHook, which accepts a 
> {{aws_conn_id}} parameter instead.
> This break back-compat generally, and more specifically it breaks the built 
> in S3KeySensor which does this:
> {code}
> def poke(self, context):
> from airflow.hooks.S3_hook import S3Hook
> hook = S3Hook(s3_conn_id=self.s3_conn_id)
> {code}
> There are a few other instances of s3_conn_id in the code base that will also 
> probably need updating/tweaking.
> My first though was to add a shim mapping s3_conn_id to aws_conn_id in the 
> S3Hook with a deprecation warning but the surface area with places where this 
> is exposed is larger. I could add such a deprecation warning to all of these. 
> Anyone have thoughts as to best way?
> - Rename all instances with deprecation warnings.
> - S3Hook accepts {{s3_conn_id}} and passes down to {{aws_conn_id}} in 
> superclass.
> - Update existing references in code base to {{aws_conn_id}}, and not in 
> updating about need to update in user code. (This is my least preferred 
> option.)
> {noformat}
> airflow/operators/redshift_to_s3_operator.py
> 33::param s3_conn_id: reference to a specific S3 connection
> 34::type s3_conn_id: string
> 51:s3_conn_id='s3_default',
> 62:self.s3_conn_id = s3_conn_id
> 69:self.s3 = S3Hook(s3_conn_id=self.s3_conn_id)
> airflow/operators/s3_file_transform_operator.py
> 40::param source_s3_conn_id: source s3 connection
> 41::type source_s3_conn_id: str
> 44::param dest_s3_conn_id: destination s3 connection
> 45::type dest_s3_conn_id: str
> 62:source_s3_conn_id='s3_default',
> 63:dest_s3_conn_id='s3_default',
> 68:self.source_s3_conn_id = source_s3_conn_id
> 70:self.dest_s3_conn_id = dest_s3_conn_id
> 75:source_s3 = S3Hook(s3_conn_id=self.source_s3_conn_id)
> 76:dest_s3 = S3Hook(s3_conn_id=self.dest_s3_conn_id)
> airflow/operators/s3_to_hive_operator.py
> 74::param s3_conn_id: source s3 connection
> 75::type s3_conn_id: str
> 102:s3_conn_id='s3_default',
> 119:self.s3_conn_id = s3_conn_id
> 130:self.s3 = S3Hook(s3_conn_id=self.s3_conn_id)
> airflow/operators/sensors.py
> 504::param s3_conn_id: a reference to the s3 connection
> 505::type s3_conn_id: str
> 514:s3_conn_id='s3_default',
> 531:self.s3_conn_id = s3_conn_id
> 535:hook = S3Hook(s3_conn_id=self.s3_conn_id)
> 568:s3_conn_id='s3_default',
> 576:self.s3_conn_id = s3_conn_id
> 582:hook = S3Hook(s3_conn_id=self.s3_conn_id)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1795) S3Hook no longer accepts s3_conn_id breaking build in ops/sensors and back-compat

2017-11-16 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255600#comment-16255600
 ] 

Chris Riccomini commented on AIRFLOW-1795:
--

We did the same thing for GCP a while ago. We did the approach you prefer: 
update all existing references, and add a note in UPDATING.md.

> S3Hook no longer accepts s3_conn_id breaking build in ops/sensors and 
> back-compat 
> --
>
> Key: AIRFLOW-1795
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1795
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Ash Berlin-Taylor
> Fix For: 1.9.0
>
>
> Found whilst testing Airflow 1.9.0rc1
> Previously the S3Hook accepted a parameter of {{s3_conn_id}}. As part of 
> AIRFLOW-1520 we moved S3Hook to have a superclass of AWSHook, which accepts a 
> {{aws_conn_id}} parameter instead.
> This break back-compat generally, and more specifically it breaks the built 
> in S3KeySensor which does this:
> {code}
> def poke(self, context):
> from airflow.hooks.S3_hook import S3Hook
> hook = S3Hook(s3_conn_id=self.s3_conn_id)
> {code}
> There are a few other instances of s3_conn_id in the code base that will also 
> probably need updating/tweaking.
> My first though was to add a shim mapping s3_conn_id to aws_conn_id in the 
> S3Hook with a deprecation warning but the surface area with places where this 
> is exposed is larger. I could add such a deprecation warning to all of these. 
> Anyone have thoughts as to best way?
> - Rename all instances with deprecation warnings.
> - S3Hook accepts {{s3_conn_id}} and passes down to {{aws_conn_id}} in 
> superclass.
> - Update existing references in code base to {{aws_conn_id}}, and not in 
> updating about need to update in user code. (This is my least preferred 
> option.)
> {noformat}
> airflow/operators/redshift_to_s3_operator.py
> 33::param s3_conn_id: reference to a specific S3 connection
> 34::type s3_conn_id: string
> 51:s3_conn_id='s3_default',
> 62:self.s3_conn_id = s3_conn_id
> 69:self.s3 = S3Hook(s3_conn_id=self.s3_conn_id)
> airflow/operators/s3_file_transform_operator.py
> 40::param source_s3_conn_id: source s3 connection
> 41::type source_s3_conn_id: str
> 44::param dest_s3_conn_id: destination s3 connection
> 45::type dest_s3_conn_id: str
> 62:source_s3_conn_id='s3_default',
> 63:dest_s3_conn_id='s3_default',
> 68:self.source_s3_conn_id = source_s3_conn_id
> 70:self.dest_s3_conn_id = dest_s3_conn_id
> 75:source_s3 = S3Hook(s3_conn_id=self.source_s3_conn_id)
> 76:dest_s3 = S3Hook(s3_conn_id=self.dest_s3_conn_id)
> airflow/operators/s3_to_hive_operator.py
> 74::param s3_conn_id: source s3 connection
> 75::type s3_conn_id: str
> 102:s3_conn_id='s3_default',
> 119:self.s3_conn_id = s3_conn_id
> 130:self.s3 = S3Hook(s3_conn_id=self.s3_conn_id)
> airflow/operators/sensors.py
> 504::param s3_conn_id: a reference to the s3 connection
> 505::type s3_conn_id: str
> 514:s3_conn_id='s3_default',
> 531:self.s3_conn_id = s3_conn_id
> 535:hook = S3Hook(s3_conn_id=self.s3_conn_id)
> 568:s3_conn_id='s3_default',
> 576:self.s3_conn_id = s3_conn_id
> 582:hook = S3Hook(s3_conn_id=self.s3_conn_id)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (AIRFLOW-1795) S3Hook no longer accepts s3_conn_id breaking build in ops/sensors and back-compat

2017-11-16 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255600#comment-16255600
 ] 

Chris Riccomini edited comment on AIRFLOW-1795 at 11/16/17 5:01 PM:


We did the same thing for GCP a while ago. We did your least preferred 
approach: update all existing references, and add a note in UPDATING.md. :) It 
all worked out fine.


was (Author: criccomini):
We did the same thing for GCP a while ago. We did the approach you prefer: 
update all existing references, and add a note in UPDATING.md.

> S3Hook no longer accepts s3_conn_id breaking build in ops/sensors and 
> back-compat 
> --
>
> Key: AIRFLOW-1795
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1795
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Ash Berlin-Taylor
> Fix For: 1.9.0
>
>
> Found whilst testing Airflow 1.9.0rc1
> Previously the S3Hook accepted a parameter of {{s3_conn_id}}. As part of 
> AIRFLOW-1520 we moved S3Hook to have a superclass of AWSHook, which accepts a 
> {{aws_conn_id}} parameter instead.
> This break back-compat generally, and more specifically it breaks the built 
> in S3KeySensor which does this:
> {code}
> def poke(self, context):
> from airflow.hooks.S3_hook import S3Hook
> hook = S3Hook(s3_conn_id=self.s3_conn_id)
> {code}
> There are a few other instances of s3_conn_id in the code base that will also 
> probably need updating/tweaking.
> My first though was to add a shim mapping s3_conn_id to aws_conn_id in the 
> S3Hook with a deprecation warning but the surface area with places where this 
> is exposed is larger. I could add such a deprecation warning to all of these. 
> Anyone have thoughts as to best way?
> - Rename all instances with deprecation warnings.
> - S3Hook accepts {{s3_conn_id}} and passes down to {{aws_conn_id}} in 
> superclass.
> - Update existing references in code base to {{aws_conn_id}}, and not in 
> updating about need to update in user code. (This is my least preferred 
> option.)
> {noformat}
> airflow/operators/redshift_to_s3_operator.py
> 33::param s3_conn_id: reference to a specific S3 connection
> 34::type s3_conn_id: string
> 51:s3_conn_id='s3_default',
> 62:self.s3_conn_id = s3_conn_id
> 69:self.s3 = S3Hook(s3_conn_id=self.s3_conn_id)
> airflow/operators/s3_file_transform_operator.py
> 40::param source_s3_conn_id: source s3 connection
> 41::type source_s3_conn_id: str
> 44::param dest_s3_conn_id: destination s3 connection
> 45::type dest_s3_conn_id: str
> 62:source_s3_conn_id='s3_default',
> 63:dest_s3_conn_id='s3_default',
> 68:self.source_s3_conn_id = source_s3_conn_id
> 70:self.dest_s3_conn_id = dest_s3_conn_id
> 75:source_s3 = S3Hook(s3_conn_id=self.source_s3_conn_id)
> 76:dest_s3 = S3Hook(s3_conn_id=self.dest_s3_conn_id)
> airflow/operators/s3_to_hive_operator.py
> 74::param s3_conn_id: source s3 connection
> 75::type s3_conn_id: str
> 102:s3_conn_id='s3_default',
> 119:self.s3_conn_id = s3_conn_id
> 130:self.s3 = S3Hook(s3_conn_id=self.s3_conn_id)
> airflow/operators/sensors.py
> 504::param s3_conn_id: a reference to the s3 connection
> 505::type s3_conn_id: str
> 514:s3_conn_id='s3_default',
> 531:self.s3_conn_id = s3_conn_id
> 535:hook = S3Hook(s3_conn_id=self.s3_conn_id)
> 568:s3_conn_id='s3_default',
> 576:self.s3_conn_id = s3_conn_id
> 582:hook = S3Hook(s3_conn_id=self.s3_conn_id)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1816) Add missing region param to DataProc{Pig,Hive,SparkSql}Operators

2017-11-15 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1816.
--
Resolution: Fixed

> Add missing region param to DataProc{Pig,Hive,SparkSql}Operators
> 
>
> Key: AIRFLOW-1816
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1816
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Reporter: Dan Sedov
>Assignee: Dan Sedov
>Priority: Minor
> Fix For: 1.10.0
>
>
> Add region field to the remainder of Dataproc Jobs that were missed under 
> AIRFLOW-1576.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1816) Add missing region param to DataProc{Pig,Hive,SparkSql}Operators

2017-11-15 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1816:
-
Fix Version/s: 1.10.0

> Add missing region param to DataProc{Pig,Hive,SparkSql}Operators
> 
>
> Key: AIRFLOW-1816
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1816
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Reporter: Dan Sedov
>Assignee: Dan Sedov
>Priority: Minor
> Fix For: 1.10.0
>
>
> Add region field to the remainder of Dataproc Jobs that were missed under 
> AIRFLOW-1576.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1787) Fix batch clear RUNNING task instance and inconsistent timestamp format bugs

2017-11-13 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1787:
-
Fix Version/s: (was: 1.9.0)

> Fix batch clear RUNNING task instance and inconsistent timestamp format bugs
> 
>
> Key: AIRFLOW-1787
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1787
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Reporter: Kevin Yang
>Assignee: Kevin Yang
> Fix For: 1.10.0
>
>
> * Batch clear in CRUD is not working for task instances in RUNNING state, 
> need to be fixed
> * Batch clear and set status are not working for manually triggered task 
> instances because manually triggered task instances have different execution 
> date format.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1787) Fix batch clear RUNNING task instance and inconsistent timestamp format bugs

2017-11-13 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1787:
-
Fix Version/s: 1.10.0

> Fix batch clear RUNNING task instance and inconsistent timestamp format bugs
> 
>
> Key: AIRFLOW-1787
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1787
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Reporter: Kevin Yang
>Assignee: Kevin Yang
> Fix For: 1.10.0
>
>
> * Batch clear in CRUD is not working for task instances in RUNNING state, 
> need to be fixed
> * Batch clear and set status are not working for manually triggered task 
> instances because manually triggered task instances have different execution 
> date format.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1765) Default API auth backed should deny all.

2017-10-30 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1765:
-
Priority: Major  (was: Blocker)

> Default API auth backed should deny all.
> 
>
> Key: AIRFLOW-1765
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1765
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: api, authentication
>Affects Versions: 1.8.2
>Reporter: Ash Berlin-Taylor
>  Labels: security
> Fix For: 1.9.0
>
>
> It has been discovered that the experimental API in the default configuration 
> is not protected behind any authentication.
> This means that out of the box the Airflow webserver's /api/experimental/ can 
> be requested by anyone, meaning pools can be updated/deleted and task 
> instance variables can be read.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1764) Web Interface should not use experimental api

2017-10-30 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1764:
-
Priority: Major  (was: Blocker)

> Web Interface should not use experimental api
> -
>
> Key: AIRFLOW-1764
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1764
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api
>Reporter: Niels Zeilemaker
>Assignee: Niels Zeilemaker
> Fix For: 1.9.0
>
>
> The web interface should not use the experimental api as the authentication 
> options differ between the two. This means that the latest_runs call should 
> be moved into the web interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1764) Web Interface should not use experimental api

2017-10-30 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1764:
-
Priority: Blocker  (was: Minor)

> Web Interface should not use experimental api
> -
>
> Key: AIRFLOW-1764
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1764
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api
>Reporter: Niels Zeilemaker
>Assignee: Niels Zeilemaker
>Priority: Blocker
> Fix For: 1.9.0
>
>
> The web interface should not use the experimental api as the authentication 
> options differ between the two. This means that the latest_runs call should 
> be moved into the web interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1764) Web Interface should not use experimental api

2017-10-30 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1764:
-
Fix Version/s: 1.9.0

> Web Interface should not use experimental api
> -
>
> Key: AIRFLOW-1764
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1764
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api
>Reporter: Niels Zeilemaker
>Assignee: Niels Zeilemaker
>Priority: Blocker
> Fix For: 1.9.0
>
>
> The web interface should not use the experimental api as the authentication 
> options differ between the two. This means that the latest_runs call should 
> be moved into the web interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1723) Support sendgrid in email backend

2017-10-30 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1723.
--
Resolution: Fixed

> Support sendgrid in email backend 
> --
>
> Key: AIRFLOW-1723
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1723
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: utils
>Affects Versions: 1.9.0
>Reporter: Feng Lu
>Assignee: Feng Lu
>Priority: Minor
> Fix For: 1.10.0
>
>
> Current airflow email backend only supports SMTP, this PR aims to extend 
> email backend and integrate with sendgrid. The airflow config file is also 
> updated to include a [sendgrid] section where user can specify api_key and 
> mail_from. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1018) Scheduler DAG processes can not log to stdout

2017-10-27 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1018:
-
Priority: Blocker  (was: Critical)

> Scheduler DAG processes can not log to stdout
> -
>
> Key: AIRFLOW-1018
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1018
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.0
> Environment: Airflow 1.8.0
>Reporter: Vincent Poulain
>Assignee: Bolke de Bruin
>Priority: Blocker
> Fix For: 1.9.0
>
>
> Each DAG has its own log file for the scheduler and we can specify the 
> directory with child_process_log_directory param. 
> Unfortunately we can not change device / by specifying /dev/stdout for 
> example. That is very useful when we execute Airflow in a container.
> When we specify /dev/stdout it raises:
> "OSError: [Errno 20] Not a directory: '/dev/stdout/2017-03-19'"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1711) Ldap Attributes not always a "list" part 2

2017-10-27 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1711:
-
Priority: Blocker  (was: Major)

> Ldap Attributes not always a "list" part 2
> --
>
> Key: AIRFLOW-1711
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1711
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Affects Versions: Airflow 1.7.1
> Environment: Linux + Active Directory
>Reporter: Steve Jacobs
>Priority: Blocker
>
> in the LDAP auth module
> `group_contains_user` checks for `resp['attributes'].get(user_name_attr)[0] 
> == username`
> Some Ldaps apparently have this as a simple string
> `resp['attributes'].get(user_name_attr) == username` 
> also should be checked. 
> But really a test should be done to see if the return is a 'list' and perform 
> the check differently. If its not a list, python will check both arguments 
> and exit with an error. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1758) Print full traceback on errors

2017-10-26 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221308#comment-16221308
 ] 

Chris Riccomini commented on AIRFLOW-1758:
--

Is this fixed by AIRFLOW-1732?

> Print full traceback on errors
> --
>
> Key: AIRFLOW-1758
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1758
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli
>Reporter: Maximilian Roos
>Priority: Minor
>
> Currently when there is a failure during a run, it's difficult to see what 
> the cause was. Could we at least print the python stack trace? 
> As an example: 
> {code:python}
> [2017-10-26 21:43:38,155] {models.py:1563} ERROR - DataFlow failed with 
> return code 1
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 1461, 
> in _run_raw_task
> result = task_copy.execute(context=context)
>   File 
> "/usr/local/lib/python2.7/dist-packages/airflow/contrib/operators/dataflow_operator.py",
>  line 192, in execute
> self.py_file, self.py_options)
>   File 
> "/usr/local/lib/python2.7/dist-packages/airflow/contrib/hooks/gcp_dataflow_hook.py",
>  line 155, in start_python_dataflow
> task_id, variables, dataflow, name, ["python"] + py_options)
>   File 
> "/usr/local/lib/python2.7/dist-packages/airflow/contrib/hooks/gcp_dataflow_hook.py",
>  line 141, in _start_dataflow
> _Dataflow(cmd).wait_for_done()
>   File 
> "/usr/local/lib/python2.7/dist-packages/airflow/contrib/hooks/gcp_dataflow_hook.py",
>  line 122, in wait_for_done
> self._proc.returncode))
> Exception: DataFlow failed with return code 1
> {code}
> I then need to jump into a repl and attempt simulate the command that airflow 
> would have run, which is both difficult and error prone. (Or is there a 
> simpler way of doing this??)
> I then get a better stack-trace:
> {code:python}
>   File "/usr/local/lib/python2.7/dist-packages/apache_beam/pipeline.py", line 
> 328, in run
> return self.runner.run(self)
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 283, in run
> self.dataflow_client.create_job(self.job), self)
>   File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", 
> line 168, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 423, in create_job
> self.create_job_description(job)
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 446, in create_job_description
> job.options, file_copy=self._gcs_file_copy)
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/dependency.py",
>  line 347, in stage_job_resources
> build_setup_args)
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/dependency.py",
>  line 439, in _build_setup_package
> os.chdir(os.path.dirname(setup_file))
> OSError: [Errno 2] No such file or directory: ''
> {code}
> Somewhat related to: https://issues.apache.org/jira/browse/AIRFLOW-174
> (I'm using the DataFlowPythonOperator at the moment, but I suspect the issue 
> is wider)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1641) Task gets stuck in queued state

2017-10-24 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217453#comment-16217453
 ] 

Chris Riccomini commented on AIRFLOW-1641:
--

In this case I'm using Blocker as a proxy for "we want to include this in 
1.9.0". Agree there are work arounds.

> Task gets stuck in queued state
> ---
>
> Key: AIRFLOW-1641
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1641
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.8.0
> Environment: Linux
>Reporter: Mas
>Assignee: Bolke de Bruin
>Priority: Blocker
>  Labels: queued, scheduler, stuck, task
> Fix For: 1.9.0
>
>
> Hello,
> I have one dag with ~20 tasks. 
> The dags runs daily and some tasks can sometime last for hours, depending on 
> the processed data behind.
> There are some interactions with AWS and a remote DB.
> I only use LocalExecutor.
> What this issue is about, is the fact that sometime (randomly, and without 
> any clear reason) one of the tasks (here also, it is random) gets stuck in 
> "queued" state and never starts running. 
> The manual workaround is to restart the task manually by clearing it.
> Does anyone have ideas about the issue behind, and how to avoid it for the 
> future? 
> Thanks in advance for your help.
> PS: other people are facing the same behaviour: 
> [link|https://stackoverflow.com/questions/45853013/airflow-tasks-get-stuck-at-queued-status-and-never-gets-running]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1744) task.retries can be False

2017-10-24 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1744:
-
Priority: Blocker  (was: Major)

> task.retries can be False 
> --
>
> Key: AIRFLOW-1744
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1744
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
>Priority: Blocker
> Fix For: 1.9.0
>
>
> When adding the max_tries field task.retries can be False (e.g. in case of a 
> faulty day). At least Postgres will not accept "False" for an integer field. 
> It is proposed to set it to try_number in case try_number > 0 otherwise to 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1013) airflow/jobs.py:manage_slas() exception for @once dag

2017-10-24 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1013:
-
Priority: Critical  (was: Major)

> airflow/jobs.py:manage_slas() exception for @once dag
> -
>
> Key: AIRFLOW-1013
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1013
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.8, 1.8.1, 1.8.0, 1.8.2
>Reporter: Ruslan Dautkhanov
>Assignee: Muhammad Ahmmad
>Priority: Critical
>  Labels: dagrun, once, scheduler, sla
> Fix For: 1.9.0
>
>
> Getting following exception 
> {noformat}
> [2017-03-19 20:16:25,786] {jobs.py:354} DagFileProcessor2638 ERROR - Got an 
> exception! Propagating...
> Traceback (most recent call last):
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
> line 346, in helper
> pickle_dags)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 53, in wrapper
> result = func(*args, **kwargs)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
> line 1581, in process_file
> self._process_dags(dagbag, dags, ti_keys_to_schedule)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
> line 1175, in _process_dags
> self.manage_slas(dag)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 53, in wrapper
> result = func(*args, **kwargs)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
> line 595, in manage_slas
> while dttm < datetime.now():
> TypeError: can't compare datetime.datetime to NoneType
> {noformat}
> Exception is in airflow/jobs.py:manage_slas() :
> https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L595
> {code}
> ts = datetime.now()
> SlaMiss = models.SlaMiss
> for ti in max_tis:
> task = dag.get_task(ti.task_id)
> dttm = ti.execution_date
> if task.sla:
> dttm = dag.following_schedule(dttm)
>   >>>   while dttm < datetime.now():  <<< here
> following_schedule = dag.following_schedule(dttm)
> if following_schedule + task.sla < datetime.now():
> session.merge(models.SlaMiss(
> task_id=ti.task_id,
> {code}
> It seems that dag.following_schedule() returns None for @once dag?
> Here's how dag is defined:
> {code}
> main_dag = DAG(
> dag_id = 'DISCOVER-Oracle-Load',
> default_args   = default_args,   
> user_defined_macros= dag_macros,   
> start_date = datetime.now(), 
> catchup= False,  
> schedule_interval  = '@once',
> concurrency= 2,  
> max_active_runs= 1,  
> dagrun_timeout = timedelta(days=4),  
> )
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1641) Task gets stuck in queued state

2017-10-24 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1641:
-
Priority: Blocker  (was: Major)

> Task gets stuck in queued state
> ---
>
> Key: AIRFLOW-1641
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1641
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.8.0
> Environment: Linux
>Reporter: Mas
>Assignee: Bolke de Bruin
>Priority: Blocker
>  Labels: queued, scheduler, stuck, task
> Fix For: 1.9.0
>
>
> Hello,
> I have one dag with ~20 tasks. 
> The dags runs daily and some tasks can sometime last for hours, depending on 
> the processed data behind.
> There are some interactions with AWS and a remote DB.
> I only use LocalExecutor.
> What this issue is about, is the fact that sometime (randomly, and without 
> any clear reason) one of the tasks (here also, it is random) gets stuck in 
> "queued" state and never starts running. 
> The manual workaround is to restart the task manually by clearing it.
> Does anyone have ideas about the issue behind, and how to avoid it for the 
> future? 
> Thanks in advance for your help.
> PS: other people are facing the same behaviour: 
> [link|https://stackoverflow.com/questions/45853013/airflow-tasks-get-stuck-at-queued-status-and-never-gets-running]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1330) Connection.parse_from_uri doesn't work for google_cloud_platform and so on

2017-10-24 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1330:
-
Fix Version/s: (was: 1.10.0)
   1.9.0

> Connection.parse_from_uri doesn't work for google_cloud_platform and so on
> --
>
> Key: AIRFLOW-1330
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1330
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Reporter: Yu Ishikawa
>Assignee: Shintaro Murakami
> Fix For: 1.9.0
>
>
> h2. Overview
> {{Connection.parse_from_uri}} doesn't work for some types like 
> {{google_cloud_platform}} whose type name includes under scores. Since 
> `urllib.parse.urlparse()` which is used in {{Connection.parse_from_url}} 
> doesn't support a schema name which include under scores.
> So, airflow's CLI doesn't work when a given connection URI includes under 
> scores like {{google_cloud_platform://X}}.
> h3. Workaround
> https://medium.com/@yuu.ishikawa/apache-airflow-how-to-add-a-connection-to-google-cloud-with-cli-af2cc8df138d



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1732) Improve Dataflow Hook Logging

2017-10-24 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1732:
-
Fix Version/s: 1.10.0

> Improve Dataflow Hook Logging
> -
>
> Key: AIRFLOW-1732
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1732
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Trevor Edwards
>Assignee: Trevor Edwards
> Fix For: 1.10.0
>
>
> Logging output for Dataflow hook could be a bit more useful. Namely:
> # Log the command that is used for opening a dataflow subprocess
> # If the dataflow subprocess experiences an error, log that error at warning 
> (instead of debug)
> Currently, errors are extremely opaque, only showing the exit code. The 
> command used is unknown and the error is logged, but it is at the debug level 
> which makes it difficult to find.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1732) Improve Dataflow Hook Logging

2017-10-24 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1732.
--
Resolution: Fixed

> Improve Dataflow Hook Logging
> -
>
> Key: AIRFLOW-1732
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1732
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Trevor Edwards
>Assignee: Trevor Edwards
> Fix For: 1.10.0
>
>
> Logging output for Dataflow hook could be a bit more useful. Namely:
> # Log the command that is used for opening a dataflow subprocess
> # If the dataflow subprocess experiences an error, log that error at warning 
> (instead of debug)
> Currently, errors are extremely opaque, only showing the exit code. The 
> command used is unknown and the error is logged, but it is at the debug level 
> which makes it difficult to find.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1754) Add GCP logging download for Dataflow operator

2017-10-24 Thread Chris Riccomini (JIRA)
Chris Riccomini created AIRFLOW-1754:


 Summary: Add GCP logging download for Dataflow operator
 Key: AIRFLOW-1754
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1754
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Chris Riccomini
 Fix For: 1.10.0


Based on conversation in AIRFLOW-1732 and 
https://github.com/apache/incubator-airflow/pull/2702, there is useful logging 
that occurs for Dataflow on the server-side (i.e. it's not visible simply by 
piping client logs to the Airflow log file).

We should add a method to fetch logs from GCP logging (stack driver?), so we 
can spool server side logging into the Dataflow operator for debugging purposes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1750) GoogleCloudStorageToBigQueryOperator 404 HttpError

2017-10-23 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215848#comment-16215848
 ] 

Chris Riccomini commented on AIRFLOW-1750:
--

It looks to me like the project id is not being properly set. Have you checked 
your hook definition, service account, etc? The URL listed in the stack trace 
has two slashes after `projects`, indicating that no project_id was set.

> GoogleCloudStorageToBigQueryOperator 404 HttpError
> --
>
> Key: AIRFLOW-1750
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1750
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: gcp
>Affects Versions: Airflow 1.8
> Environment: Python 2.7.13
>Reporter: Mark Secada
> Fix For: Airflow 1.8
>
>
> I'm trying to write a DAG which uploads JSON files to GoogleCloudStorage and 
> then moves them to BigQuery. I was able to upload these files to 
> GoogleCloudStorage, but when I run this second task, I get a 404 HttpError. 
> The error looks like this:
> {code:bash}
> ERROR -  https://www.googleapis.com/bigquery/v2/projects//jobs?alt=json returned "Not 
> Found">
> Traceback (most recent call last):
>   File 
> "/Users/myname/anaconda/lib/python2.7/site-packages/airflow/models.py", line 
> 1374, in run
> result = task_copy.execute(context=context)
>   File 
> "/Users/myname/anaconda/lib/python2.7/site-packages/airflow/contrib/operators/gcs_to_bq.py",
>  line 153, in execute
> schema_update_options=self.schema_update_options)
>   File 
> "/Users/myname/anaconda/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py",
>  line 476, in run_load
> return self.run_with_configuration(configuration)
>   File 
> "/Users/myname/anaconda/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py",
>  line 498, in run_with_configuration
> .insert(projectId=self.project_id, body=job_data) \
>   File 
> "/Users/myname/anaconda/lib/python2.7/site-packages/oauth2client/util.py", 
> line 135, in positional_wrapper
> return wrapped(*args, **kwargs)
>   File 
> "/Users/myname/anaconda/lib/python2.7/site-packages/googleapiclient/http.py", 
> line 838, in execute
> raise HttpError(resp, content, uri=self.uri)
> {code}
> My code for the task is here:
> {code:python}
> // Some comments here
> t3 = GoogleCloudStorageToBigQueryOperator(
> task_id='move_'+source+'_from_gcs_to_bq',
> bucket='mybucket',
> source_objects=['news/latest_headline_'+source+'.json'],
> destination_project_dataset_table='mydataset.latest_news_headlines',
> schema_object='news/latest_headline_'+source+'.json',
> source_format='NEWLINE_DELIMITED_JSON',
> write_disposition='WRITE_APPEND'
> dag=dag)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1728) Add networkUri, subnetworkUri and tags to DataprocClusterCreateOperator

2017-10-20 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1728.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

> Add networkUri, subnetworkUri and tags to DataprocClusterCreateOperator 
> 
>
> Key: AIRFLOW-1728
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1728
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Jessica Chen Fan 
>Assignee: Jessica Chen Fan 
>Priority: Minor
> Fix For: 1.10.0
>
>
> Add ability to specify subnetwork and firewall tags for easier networking 
> when creating 
> dataproc clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1731) Import custom config on PYTHONPATH

2017-10-20 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1731:
-
Priority: Blocker  (was: Major)

> Import custom config on PYTHONPATH
> --
>
> Key: AIRFLOW-1731
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1731
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.9.0
>Reporter: Fokko Driesprong
>Priority: Blocker
> Fix For: 1.9.0
>
>
> Currently the PYTHONPATH does not contain the required path to import a 
> custom config as described. This needs to be fixed and the instructions needs 
> to be updated based on user feedback.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1330) Connection.parse_from_uri doesn't work for google_cloud_platform and so on

2017-10-19 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1330.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

> Connection.parse_from_uri doesn't work for google_cloud_platform and so on
> --
>
> Key: AIRFLOW-1330
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1330
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Reporter: Yu Ishikawa
>Assignee: Shintaro Murakami
> Fix For: 1.10.0
>
>
> h2. Overview
> {{Connection.parse_from_uri}} doesn't work for some types like 
> {{google_cloud_platform}} whose type name includes under scores. Since 
> `urllib.parse.urlparse()` which is used in {{Connection.parse_from_url}} 
> doesn't support a schema name which include under scores.
> So, airflow's CLI doesn't work when a given connection URI includes under 
> scores like {{google_cloud_platform://X}}.
> h3. Workaround
> https://medium.com/@yuu.ishikawa/apache-airflow-how-to-add-a-connection-to-google-cloud-with-cli-af2cc8df138d



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1723) Support sendgrid in email backend

2017-10-18 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1723.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

> Support sendgrid in email backend 
> --
>
> Key: AIRFLOW-1723
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1723
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: utils
>Affects Versions: 1.9.0
>Reporter: Feng Lu
>Assignee: Feng Lu
>Priority: Minor
> Fix For: 1.10.0
>
>
> Current airflow email backend only supports SMTP, this PR aims to extend 
> email backend and integrate with sendgrid. The airflow config file is also 
> updated to include a [sendgrid] section where user can specify api_key and 
> mail_from. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1506) Improve DataprocClusterCreateOperator

2017-10-18 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1506.
--
Resolution: Duplicate

> Improve DataprocClusterCreateOperator
> -
>
> Key: AIRFLOW-1506
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1506
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib, operators
>Reporter: Yu Ishikawa
>Assignee: Jessica Chen Fan 
>
> h2. Goals
> {{DataprocClusterCreateOperator}} should support {{$. 
> gceClusterConfig.serviceAccountScopes}} to specify scopes for a Dataproc 
> cluster.
> For example, I guess some users would like to store the result to Google 
> Datastore and like this. In such a case, we have to put additional scopes to 
> access to the produces from the Dataproc cluster.
> https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.clusters#gceclusterconfig



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1718) Increase num_retries polling value on Dataproc hook

2017-10-18 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1718.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

> Increase num_retries polling value on Dataproc hook
> ---
>
> Key: AIRFLOW-1718
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1718
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Crystal Qian
>Assignee: Crystal Qian
>Priority: Minor
> Fix For: 1.10.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, `num_retries = 0` when execute() is called 
> (https://google.github.io/google-api-python-client/docs/epy/googleapiclient.http.HttpRequest-class.html#execute),
>  which causes intermittent 500 errors 
> (https://stackoverflow.com/questions/46522261/deadline-exceeded-when-airflow-runs-spark-jobs).
>  We should increase this to allow retries for internal Dataproc queries to 
> other services in the short-term; also seeing if the `num_retries` count can 
> be increased at the _google-api-python-client_ level in the long-term.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1727) Add unit tests for DataProcHook

2017-10-18 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1727:
-
Fix Version/s: 1.10.0

> Add unit tests for DataProcHook
> ---
>
> Key: AIRFLOW-1727
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1727
> Project: Apache Airflow
>  Issue Type: Test
>Reporter: Crystal Qian
>Assignee: Crystal Qian
>Priority: Minor
> Fix For: 1.10.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1727) Add unit tests for DataProcHook

2017-10-18 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1727.
--
Resolution: Fixed

> Add unit tests for DataProcHook
> ---
>
> Key: AIRFLOW-1727
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1727
> Project: Apache Airflow
>  Issue Type: Test
>Reporter: Crystal Qian
>Assignee: Crystal Qian
>Priority: Minor
> Fix For: 1.10.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1683) Cancel pending GCP Big Query job when task times out

2017-10-16 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1683.
--
   Resolution: Fixed
Fix Version/s: (was: Airflow 2.0)
   1.10.0

> Cancel pending GCP Big Query job when task times out
> 
>
> Key: AIRFLOW-1683
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1683
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib
>Affects Versions: Airflow 2.0
>Reporter: Feng Lu
>Assignee: Feng Lu
>Priority: Minor
> Fix For: 1.10.0
>
>
> When BigQuery task times out, the pending big query job should be cancelled. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-976) Mark success running task causes it to fail

2017-10-16 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-976.
-
Resolution: Fixed

Fixed in 
https://github.com/apache/incubator-airflow/commit/b2e1753f5b74ad1b6e0889f7b784ce69623c95ce

> Mark success running task causes it to fail
> ---
>
> Key: AIRFLOW-976
> URL: https://issues.apache.org/jira/browse/AIRFLOW-976
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Dan Davydov
>Assignee: Alex Guziel
> Fix For: 1.9.0
>
>
> Marking success on a running task in the UI causes it to fail.
> Expected Behavior:
> Task instance is killed and marked as successful
> Actual Behavior:
> Task instance is killed and marked as failed
> [~saguziel] [~bolke]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1613) Make MySqlToGoogleCloudStorageOperator compaitible with python3

2017-10-13 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204373#comment-16204373
 ] 

Chris Riccomini commented on AIRFLOW-1613:
--

Ended up having to revert this due to the following error:

{noformat}
Try 5 out of 4
Exception:
must be string or buffer, not None
Log: Link
{noformat}

Stack trace indicates mysql_to_gcs.py issue. Apparently some VARCHAR fields 
have `binary` flag set in description_flags field. This was unexpected.

> Make MySqlToGoogleCloudStorageOperator compaitible with python3
> ---
>
> Key: AIRFLOW-1613
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1613
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Reporter: Joy Gao
>Assignee: Joy Gao
> Fix For: 1.9.0
>
>
> 1. 
> In Python 3, map(...) returns an iterator, which can only be iterated over 
> once. 
> Therefore the current implementation will return an empty list after the 
> first iteration of schema:
> {code}
> schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
> file_no = 0
> tmp_file_handle = NamedTemporaryFile(delete=True)
> tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}
> for row in cursor:
> # Convert datetime objects to utc seconds, and decimals to floats
> row = map(self.convert_types, row)
> row_dict = dict(zip(schema, row))
> {code}
> 2.
> File opened as binary, but string are written to it. Get error `a bytes-like 
> object is required, not 'str'`. Use mode='w' instead.
> 3.
> Operator currently does not support binary columns in mysql.  We should 
> support uploading binary columns from mysql to cloud storage as it's a pretty 
> common use-case. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (AIRFLOW-1613) Make MySqlToGoogleCloudStorageOperator compaitible with python3

2017-10-13 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini reopened AIRFLOW-1613:
--

> Make MySqlToGoogleCloudStorageOperator compaitible with python3
> ---
>
> Key: AIRFLOW-1613
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1613
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Reporter: Joy Gao
>Assignee: Joy Gao
> Fix For: 1.9.0
>
>
> 1. 
> In Python 3, map(...) returns an iterator, which can only be iterated over 
> once. 
> Therefore the current implementation will return an empty list after the 
> first iteration of schema:
> {code}
> schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
> file_no = 0
> tmp_file_handle = NamedTemporaryFile(delete=True)
> tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}
> for row in cursor:
> # Convert datetime objects to utc seconds, and decimals to floats
> row = map(self.convert_types, row)
> row_dict = dict(zip(schema, row))
> {code}
> 2.
> File opened as binary, but string are written to it. Get error `a bytes-like 
> object is required, not 'str'`. Use mode='w' instead.
> 3.
> Operator currently does not support binary columns in mysql.  We should 
> support uploading binary columns from mysql to cloud storage as it's a pretty 
> common use-case. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1714) "Separated" is misspelled in admin/connections tab

2017-10-13 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1714.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

> "Separated" is misspelled in admin/connections tab
> --
>
> Key: AIRFLOW-1714
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1714
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Affects Versions: Airflow 2.0
>Reporter: William Pursell
>Assignee: William Pursell
>Priority: Trivial
> Fix For: 1.10.0
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1710) Add timezone setting to Airflow DAG

2017-10-12 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202511#comment-16202511
 ] 

Chris Riccomini commented on AIRFLOW-1710:
--

Another alternative that people are doing to hack around this issue:

https://stackoverflow.com/questions/43662571/how-to-properly-handle-daylight-savings-time-in-apache-airflow/43664910#43664910

> Add timezone setting to Airflow DAG
> ---
>
> Key: AIRFLOW-1710
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1710
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Affects Versions: 1.9.0
>Reporter: Chris Riccomini
>
> We have some use cases where we'd like to run DAGs pegged to a specific 
> timezone.
> Customers want things to happen in their time zone. If they have DST, it's 
> not a constant offset from UTC. If they aren't in the US, their DST isn't a 
> constant offset from California's DST (where we run). GB, for example.
> One way to solve this would be to have the DAG start with PythonOperator task 
> that calculates the difference between UTC and the expected timezone, and 
> sleeps for that amount of time.
> Another (cleaner?) way would be to add a field to the DAG model that allows 
> the DAG author to specify the timezone that the DAG should be scheduled in. 
> For example, we could have our Airflow box continue to run on UTC, but 
> schedule a specific DAG for Pacific/US, which would adjust according to 
> daylight savings time. We could schedule other DAGs to run on GB DST, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1696) GCP Dataproc Operator fails due to '+' in Airflow version

2017-10-11 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1696.
--
Resolution: Fixed

> GCP Dataproc Operator fails due to '+' in Airflow version
> -
>
> Key: AIRFLOW-1696
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1696
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Trevor Edwards
>Assignee: Trevor Edwards
> Fix For: 1.10.0
>
>
> Since Dataproc operator attaches the Airflow version as one of its labels, 
> and dataproc labels cannot include the character '+', dataproc operator 
> currently fails with the following error:
> {code:none}
> [2017-10-09 19:28:48,035] {base_task_runner.py:115} INFO - Running: ['bash', 
> '-c', u'airflow run smokey-dataproc start_cluster 2017-10-08T00:00:00 
> --job_id 6 --raw -sd DAGS_FOLDER/dataa.py']
> [2017-10-09 19:28:49,041] {base_task_runner.py:98} INFO - Subtask: 
> [2017-10-09 19:28:49,041] {__init__.py:45} INFO - Using executor LocalExecutor
> [2017-10-09 19:28:49,139] {base_task_runner.py:98} INFO - Subtask: 
> [2017-10-09 19:28:49,139] {models.py:187} INFO - Filling up the DagBag from 
> /home/airflow/dags/dataa.py
> [2017-10-09 19:28:49,258] {base_task_runner.py:98} INFO - Subtask: Cluster 
> name: smoke-cluster-aa05845a-1b60-4543-94d0-f7dfddb90ee0
> [2017-10-09 19:28:49,258] {base_task_runner.py:98} INFO - Subtask: 
> [2017-10-09 19:28:49,257] {dataproc_operator.py:267} INFO - Creating cluster: 
> smoke-cluster-aa05845a-1b60-4543-94d0-f7dfddb90ee0
> [2017-10-09 19:28:49,265] {base_task_runner.py:98} INFO - Subtask: 
> [2017-10-09 19:28:49,265] {gcp_api_base_hook.py:82} INFO - Getting connection 
> using a JSON key file.
> [2017-10-09 19:28:59,909] {base_task_runner.py:98} INFO - Subtask: 
> [2017-10-09 19:28:59,906] {models.py:1564} ERROR -  requesting 
> https://dataproc.googleapis.com/v1/projects/cloud-airflow-test/regions/global/clusters?alt=json
>  returned "Multiple validation errors:
> [2017-10-09 19:28:59,910] {base_task_runner.py:98} INFO - Subtask:  - Not a 
> valid value: "v1-10-0dev0+incubating". Only lowercase letters, numbers, and 
> dashes are allowed. The value must start with lowercase letter or number and 
> end with a lowercase letter or number.
> [2017-10-09 19:28:59,910] {base_task_runner.py:98} INFO - Subtask:  - User 
> label value must conform to '[\p{Ll}\p{Lo}\p{N}_-]{0,63}' pattern">
> [2017-10-09 19:28:59,911] {base_task_runner.py:98} INFO - Subtask: Traceback 
> (most recent call last):
> [2017-10-09 19:28:59,911] {base_task_runner.py:98} INFO - Subtask:   File 
> "/home/airflow/incubator-airflow/airflow/models.py", line 1462, in 
> _run_raw_task
> [2017-10-09 19:28:59,911] {base_task_runner.py:98} INFO - Subtask: result 
> = task_copy.execute(context=context)
> [2017-10-09 19:28:59,912] {base_task_runner.py:98} INFO - Subtask:   File 
> "/home/airflow/incubator-airflow/airflow/contrib/operators/dataproc_operator.py",
>  line 300, in execute
> [2017-10-09 19:28:59,912] {base_task_runner.py:98} INFO - Subtask: raise e
> [2017-10-09 19:28:59,913] {base_task_runner.py:98} INFO - Subtask: HttpError: 
>  https://dataproc.googleapis.com/v1/projects/cloud-airflow-test/regions/global/clusters?alt=json
>  returned "Multiple validation errors:
> [2017-10-09 19:28:59,914] {base_task_runner.py:98} INFO - Subtask:  - Not a 
> valid value: "v1-10-0dev0+incubating". Only lowercase letters, numbers, and 
> dashes are allowed. The value must start with lowercase letter or number and 
> end with a lowercase letter or number.
> [2017-10-09 19:28:59,914] {base_task_runner.py:98} INFO - Subtask:  - User 
> label value must conform to '[\p{Ll}\p{Lo}\p{N}_-]{0,63}' pattern">
> [2017-10-09 19:28:59,915] {base_task_runner.py:98} INFO - Subtask: 
> [2017-10-09 19:28:59,909] {models.py:1585} INFO - Marking task as UP_FOR_RETRY
> [2017-10-09 19:28:59,978] {base_task_runner.py:98} INFO - Subtask: 
> [2017-10-09 19:28:59,977] {models.py:1613} ERROR -  requesting 
> https://dataproc.googleapis.com/v1/projects/cloud-airflow-test/regions/global/clusters?alt=json
>  returned "Multiple validation errors:
> [2017-10-09 19:28:59,978] {base_task_runner.py:98} INFO - Subtask:  - Not a 
> valid value: "v1-10-0dev0+incubating". Only lowercase letters, numbers, and 
> dashes are allowed. The value must start with lowercase letter or number and 
> end with a lowercase letter or number.
> [2017-10-09 19:28:59,979] {base_task_runner.py:98} INFO - Subtask:  - User 
> label value must conform to '[\p{Ll}\p{Lo}\p{N}_-]{0,63}' pattern">
> [2017-10-09 19:28:59,979] {base_task_runner.py:98} INFO - Subtask: Traceback 
> (most recent call last):
> [2017-10-09 19:28:59,979] {base_task_runner.py:98} INFO - Subtask:   File 
> "/usr/local/bin/airflow", line 6, in 
> [2017-10-09 

[jira] [Updated] (AIRFLOW-1696) GCP Dataproc Operator fails due to '+' in Airflow version

2017-10-11 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1696:
-
Fix Version/s: 1.10.0

> GCP Dataproc Operator fails due to '+' in Airflow version
> -
>
> Key: AIRFLOW-1696
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1696
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Trevor Edwards
>Assignee: Trevor Edwards
> Fix For: 1.10.0
>
>
> Since Dataproc operator attaches the Airflow version as one of its labels, 
> and dataproc labels cannot include the character '+', dataproc operator 
> currently fails with the following error:
> {code:none}
> [2017-10-09 19:28:48,035] {base_task_runner.py:115} INFO - Running: ['bash', 
> '-c', u'airflow run smokey-dataproc start_cluster 2017-10-08T00:00:00 
> --job_id 6 --raw -sd DAGS_FOLDER/dataa.py']
> [2017-10-09 19:28:49,041] {base_task_runner.py:98} INFO - Subtask: 
> [2017-10-09 19:28:49,041] {__init__.py:45} INFO - Using executor LocalExecutor
> [2017-10-09 19:28:49,139] {base_task_runner.py:98} INFO - Subtask: 
> [2017-10-09 19:28:49,139] {models.py:187} INFO - Filling up the DagBag from 
> /home/airflow/dags/dataa.py
> [2017-10-09 19:28:49,258] {base_task_runner.py:98} INFO - Subtask: Cluster 
> name: smoke-cluster-aa05845a-1b60-4543-94d0-f7dfddb90ee0
> [2017-10-09 19:28:49,258] {base_task_runner.py:98} INFO - Subtask: 
> [2017-10-09 19:28:49,257] {dataproc_operator.py:267} INFO - Creating cluster: 
> smoke-cluster-aa05845a-1b60-4543-94d0-f7dfddb90ee0
> [2017-10-09 19:28:49,265] {base_task_runner.py:98} INFO - Subtask: 
> [2017-10-09 19:28:49,265] {gcp_api_base_hook.py:82} INFO - Getting connection 
> using a JSON key file.
> [2017-10-09 19:28:59,909] {base_task_runner.py:98} INFO - Subtask: 
> [2017-10-09 19:28:59,906] {models.py:1564} ERROR -  requesting 
> https://dataproc.googleapis.com/v1/projects/cloud-airflow-test/regions/global/clusters?alt=json
>  returned "Multiple validation errors:
> [2017-10-09 19:28:59,910] {base_task_runner.py:98} INFO - Subtask:  - Not a 
> valid value: "v1-10-0dev0+incubating". Only lowercase letters, numbers, and 
> dashes are allowed. The value must start with lowercase letter or number and 
> end with a lowercase letter or number.
> [2017-10-09 19:28:59,910] {base_task_runner.py:98} INFO - Subtask:  - User 
> label value must conform to '[\p{Ll}\p{Lo}\p{N}_-]{0,63}' pattern">
> [2017-10-09 19:28:59,911] {base_task_runner.py:98} INFO - Subtask: Traceback 
> (most recent call last):
> [2017-10-09 19:28:59,911] {base_task_runner.py:98} INFO - Subtask:   File 
> "/home/airflow/incubator-airflow/airflow/models.py", line 1462, in 
> _run_raw_task
> [2017-10-09 19:28:59,911] {base_task_runner.py:98} INFO - Subtask: result 
> = task_copy.execute(context=context)
> [2017-10-09 19:28:59,912] {base_task_runner.py:98} INFO - Subtask:   File 
> "/home/airflow/incubator-airflow/airflow/contrib/operators/dataproc_operator.py",
>  line 300, in execute
> [2017-10-09 19:28:59,912] {base_task_runner.py:98} INFO - Subtask: raise e
> [2017-10-09 19:28:59,913] {base_task_runner.py:98} INFO - Subtask: HttpError: 
>  https://dataproc.googleapis.com/v1/projects/cloud-airflow-test/regions/global/clusters?alt=json
>  returned "Multiple validation errors:
> [2017-10-09 19:28:59,914] {base_task_runner.py:98} INFO - Subtask:  - Not a 
> valid value: "v1-10-0dev0+incubating". Only lowercase letters, numbers, and 
> dashes are allowed. The value must start with lowercase letter or number and 
> end with a lowercase letter or number.
> [2017-10-09 19:28:59,914] {base_task_runner.py:98} INFO - Subtask:  - User 
> label value must conform to '[\p{Ll}\p{Lo}\p{N}_-]{0,63}' pattern">
> [2017-10-09 19:28:59,915] {base_task_runner.py:98} INFO - Subtask: 
> [2017-10-09 19:28:59,909] {models.py:1585} INFO - Marking task as UP_FOR_RETRY
> [2017-10-09 19:28:59,978] {base_task_runner.py:98} INFO - Subtask: 
> [2017-10-09 19:28:59,977] {models.py:1613} ERROR -  requesting 
> https://dataproc.googleapis.com/v1/projects/cloud-airflow-test/regions/global/clusters?alt=json
>  returned "Multiple validation errors:
> [2017-10-09 19:28:59,978] {base_task_runner.py:98} INFO - Subtask:  - Not a 
> valid value: "v1-10-0dev0+incubating". Only lowercase letters, numbers, and 
> dashes are allowed. The value must start with lowercase letter or number and 
> end with a lowercase letter or number.
> [2017-10-09 19:28:59,979] {base_task_runner.py:98} INFO - Subtask:  - User 
> label value must conform to '[\p{Ll}\p{Lo}\p{N}_-]{0,63}' pattern">
> [2017-10-09 19:28:59,979] {base_task_runner.py:98} INFO - Subtask: Traceback 
> (most recent call last):
> [2017-10-09 19:28:59,979] {base_task_runner.py:98} INFO - Subtask:   File 
> "/usr/local/bin/airflow", line 6, in 
> [2017-10-09 

[jira] [Resolved] (AIRFLOW-1613) Make MySqlToGoogleCloudStorageOperator compaitible with python3

2017-10-11 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1613.
--
Resolution: Fixed

> Make MySqlToGoogleCloudStorageOperator compaitible with python3
> ---
>
> Key: AIRFLOW-1613
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1613
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Reporter: Joy Gao
>Assignee: Joy Gao
> Fix For: 1.9.0
>
>
> 1. 
> In Python 3, map(...) returns an iterator, which can only be iterated over 
> once. 
> Therefore the current implementation will return an empty list after the 
> first iteration of schema:
> {code}
> schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
> file_no = 0
> tmp_file_handle = NamedTemporaryFile(delete=True)
> tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}
> for row in cursor:
> # Convert datetime objects to utc seconds, and decimals to floats
> row = map(self.convert_types, row)
> row_dict = dict(zip(schema, row))
> {code}
> 2.
> File opened as binary, but string are written to it. Get error `a bytes-like 
> object is required, not 'str'`. Use mode='w' instead.
> 3.
> Operator currently does not support binary columns in mysql.  We should 
> support uploading binary columns from mysql to cloud storage as it's a pretty 
> common use-case. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1691) Add better documentation for Google cloud storage logging

2017-10-09 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1691.
--
Resolution: Fixed

> Add better documentation for Google cloud storage logging
> -
>
> Key: AIRFLOW-1691
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1691
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp, logging
>Reporter: Chris Riccomini
>Assignee: Chris Riccomini
> Fix For: 1.9.0
>
>
> The documentation for the new logging changes are very difficult to follow. 
> I've added very explicit instructions specifically for Google cloud storage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1691) Add better documentation for Google cloud storage logging

2017-10-06 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16194999#comment-16194999
 ] 

Chris Riccomini commented on AIRFLOW-1691:
--

https://github.com/apache/incubator-airflow/pull/2671

> Add better documentation for Google cloud storage logging
> -
>
> Key: AIRFLOW-1691
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1691
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp, logging
>Reporter: Chris Riccomini
>Assignee: Chris Riccomini
> Fix For: 1.9.0
>
>
> The documentation for the new logging changes are very difficult to follow. 
> I've added very explicit instructions specifically for Google cloud storage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1691) Add better documentation for Google cloud storage logging

2017-10-06 Thread Chris Riccomini (JIRA)
Chris Riccomini created AIRFLOW-1691:


 Summary: Add better documentation for Google cloud storage logging
 Key: AIRFLOW-1691
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1691
 Project: Apache Airflow
  Issue Type: Improvement
  Components: gcp, logging
Reporter: Chris Riccomini
Assignee: Chris Riccomini
 Fix For: 1.9.0


The documentation for the new logging changes are very difficult to follow. 
I've added very explicit instructions specifically for Google cloud storage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1690) Error messages regarding gcs log commits are sparse

2017-10-06 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1690:
-
Fix Version/s: 1.9.0

> Error messages regarding gcs log commits are sparse
> ---
>
> Key: AIRFLOW-1690
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1690
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.9.0
>Reporter: William Pursell
>Assignee: William Pursell
>Priority: Minor
> Fix For: 1.9.0
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> Whether there is a local error creating a temporary file, or a connection 
> error, the log message reads: "Could not write logs to %s" % 
> remote_log_location
> This is not enough information to debug an error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1682) S3 task handler never writes to S3

2017-10-06 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1682.
--
   Resolution: Fixed
Fix Version/s: 1.9.0

> S3 task handler never writes to S3
> --
>
> Key: AIRFLOW-1682
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1682
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Arthur Vigil
>Assignee: Arthur Vigil
> Fix For: 1.9.0
>
>
> S3TaskHandler has the same problem as the GCSTaskHandler reported in 
> AIRFLOW-1676, where the log never gets uploaded because _hook is never set



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1635) Allow creating Google Cloud Platform connection without requiring a JSON file

2017-10-03 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1635:
-
Fix Version/s: (was: 1.10.0)
   1.9.0

> Allow creating Google Cloud Platform connection without requiring a JSON file
> -
>
> Key: AIRFLOW-1635
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1635
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp
>Affects Versions: 1.8.2
>Reporter: Barry Hart
>Assignee: Barry Hart
> Fix For: 1.9.0
>
> Attachments: AIRFLOW-1635.png
>
>
> Most connection types can be created purely from the Airflow UI. Google Cloud 
> Platform connections do not support this, because they require a JSON file to 
> be present on disk. This is awkward for users who only have UI access (i.e. 
> no direct access to the server's file system) because it requires 
> coordination with a system administrator to add or update a connection.
> I propose that Airflow offer two ways to set up a Google cloud connection:
> * The current method of placing a file on disk and entering its path.
> * New method where the Airflow user/administrator pastes the JSON *contents* 
> into the Airflow UI. This will be a new field in the UI.
> If both a path and JSON data are provided, the path will take precedence. 
> This is somewhat arbitrary; typically only one field would contain a value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (AIRFLOW-1676) GCS task handler never writes to GCS

2017-10-03 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini reassigned AIRFLOW-1676:


Assignee: Chris Riccomini

> GCS task handler never writes to GCS
> 
>
> Key: AIRFLOW-1676
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1676
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Chris Riccomini
>Assignee: Chris Riccomini
> Fix For: 1.9.0
>
>
> I discovered a bug in 1.9.0alpha0 with GCSTaskHandler. It seems to be that 
> it's impossible for this task handler to write to GCS. When close() is 
> called, it always returns immediately because _hook is never set.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1676) GCS task handler never writes to GCS

2017-10-03 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190831#comment-16190831
 ] 

Chris Riccomini commented on AIRFLOW-1676:
--

https://github.com/apache/incubator-airflow/pull/2659

> GCS task handler never writes to GCS
> 
>
> Key: AIRFLOW-1676
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1676
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Chris Riccomini
> Fix For: 1.9.0
>
>
> I discovered a bug in 1.9.0alpha0 with GCSTaskHandler. It seems to be that 
> it's impossible for this task handler to write to GCS. When close() is 
> called, it always returns immediately because _hook is never set.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1676) GCS task handler never writes to GCS

2017-10-03 Thread Chris Riccomini (JIRA)
Chris Riccomini created AIRFLOW-1676:


 Summary: GCS task handler never writes to GCS
 Key: AIRFLOW-1676
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1676
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Chris Riccomini
 Fix For: 1.9.0


I discovered a bug in 1.9.0alpha0 with GCSTaskHandler. It seems to be that it's 
impossible for this task handler to write to GCS. When close() is called, it 
always returns immediately because _hook is never set.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1671) Missing @apply_defaults annotation for gcs download operator

2017-10-02 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1671.
--
Resolution: Fixed

> Missing @apply_defaults annotation for gcs download operator
> 
>
> Key: AIRFLOW-1671
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1671
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.9.0
>Reporter: Joy Gao
>Assignee: Joy Gao
> Fix For: 1.9.0
>
>
> The @apply_defaults annotation appear to be accidentally removed in a 
> previous PR. Should be added back. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1671) Missing @apply_defaults annotation for gcs download operator

2017-10-02 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1671:
-
Summary: Missing @apply_defaults annotation for gcs download operator  
(was: MIssing @apply_defaults annotation for gcs operator)

> Missing @apply_defaults annotation for gcs download operator
> 
>
> Key: AIRFLOW-1671
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1671
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.9.0
>Reporter: Joy Gao
>Assignee: Joy Gao
> Fix For: 1.9.0
>
>
> The @apply_defaults annotation appear to be accidentally removed in a 
> previous PR. Should be added back. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-988) SLA Miss Callbacks Are Repeated if Email is Not being Used

2017-10-02 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-988.
-
Resolution: Fixed

> SLA Miss Callbacks Are Repeated if Email is Not being Used
> --
>
> Key: AIRFLOW-988
> URL: https://issues.apache.org/jira/browse/AIRFLOW-988
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.8
>Reporter: Zachary Lawson
>Assignee: Charlie Jones
> Fix For: 1.9.0
>
>
> There is an issue in the current v1-8-stable branch. Looking at the jobs.py 
> module, if the system does not have email set up but does have a 
> sla_miss_callback defined in the DAG, that sla_miss_callback is repeated for 
> that job infinitely as long as the airflow scheduler is running. The 
> offending code seems to be in the query to the airflow meta database which 
> filters to sla_miss records that have *either* email_sent or 
> notification_sent as false ([see lines 
> 606-613|https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L606-L613]),
>  but then executes the sla_miss_callback function regardless if 
> notification_sent was true ([see lines 
> 644-648|https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L644-L648]).
>  A conditional statement should be put prior to executing the 
> sla_miss_callback to check whether a notification has been sent to prevent 
> this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1397) Airflow 1.8.1 - No data displays in Last Run Column in Airflow UI

2017-09-29 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1397:
-
Fix Version/s: 1.9.0

> Airflow 1.8.1 - No data displays in Last Run Column in Airflow UI
> -
>
> Key: AIRFLOW-1397
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1397
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, ui
>Affects Versions: 1.8.1
>Reporter: user_airflow
>Assignee: user_airflow
>Priority: Critical
> Fix For: 1.9.0
>
>
> Recently upgraded Airflow version from 1.8.0 to 1.8.1. After upgrading, the 
> Last Run column in Airflow UI started showing as Blank for all the existing 
> dags.
> Created a pr for this bug: 
> https://github.com/apache/incubator-airflow/pull/2430



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (AIRFLOW-1483) Page size on model views is to large to render quickly

2017-09-29 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini reopened AIRFLOW-1483:
--

> Page size on model views is to large to render quickly
> --
>
> Key: AIRFLOW-1483
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1483
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Edgar Rodriguez
>Assignee: Edgar Rodriguez
> Fix For: 1.9.0
>
> Attachments: taskinstance_page_loading_breakdown.png
>
>
> The current hardcoded values for the {{page_size}} on {{AirflowModelView}} is 
> set to {{500}} rows, which is usually too large to render in less than 1-2 
> secs in modern browsers. 
> Also, in some endpoints it is also taking a long time to render server-side 
> the HTML content for 500 rows, taking around 1-2 secs (on the server) or 
> sometimes more.
> Simple approach is to reduce this value to something more sensible (50 
> maybe?). Probably making it a configurable value would be a good option too 
> in case the default is not good enough.
> See attachment for a profiled sample of a page loading time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1483) Page size on model views is to large to render quickly

2017-09-29 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-1483:
-
Fix Version/s: 1.9.0

> Page size on model views is to large to render quickly
> --
>
> Key: AIRFLOW-1483
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1483
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Edgar Rodriguez
>Assignee: Edgar Rodriguez
> Fix For: 1.9.0
>
> Attachments: taskinstance_page_loading_breakdown.png
>
>
> The current hardcoded values for the {{page_size}} on {{AirflowModelView}} is 
> set to {{500}} rows, which is usually too large to render in less than 1-2 
> secs in modern browsers. 
> Also, in some endpoints it is also taking a long time to render server-side 
> the HTML content for 500 rows, taking around 1-2 secs (on the server) or 
> sometimes more.
> Simple approach is to reduce this value to something more sensible (50 
> maybe?). Probably making it a configurable value would be a good option too 
> in case the default is not good enough.
> See attachment for a profiled sample of a page loading time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (AIRFLOW-1483) Page size on model views is to large to render quickly

2017-09-29 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini closed AIRFLOW-1483.

Resolution: Fixed

> Page size on model views is to large to render quickly
> --
>
> Key: AIRFLOW-1483
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1483
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Edgar Rodriguez
>Assignee: Edgar Rodriguez
> Fix For: 1.9.0
>
> Attachments: taskinstance_page_loading_breakdown.png
>
>
> The current hardcoded values for the {{page_size}} on {{AirflowModelView}} is 
> set to {{500}} rows, which is usually too large to render in less than 1-2 
> secs in modern browsers. 
> Also, in some endpoints it is also taking a long time to render server-side 
> the HTML content for 500 rows, taking around 1-2 secs (on the server) or 
> sometimes more.
> Simple approach is to reduce this value to something more sensible (50 
> maybe?). Probably making it a configurable value would be a good option too 
> in case the default is not good enough.
> See attachment for a profiled sample of a page loading time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   3   4   5   6   7   >