[jira] [Closed] (AIRFLOW-667) Handle 503 errors on google bigquery_hook

2016-12-14 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini closed AIRFLOW-667.
---
   Resolution: Fixed
Fix Version/s: Airflow 1.8

> Handle 503 errors on google bigquery_hook 
> --
>
> Key: AIRFLOW-667
> URL: https://issues.apache.org/jira/browse/AIRFLOW-667
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp
>Affects Versions: Airflow 2.0, Airflow 1.7.1, Airflow 1.7.0
>Reporter: Krishnaveni Mettu
>Assignee: Krishnaveni Mettu
>Priority: Minor
> Fix For: Airflow 1.8
>
>
> Getting 503 errors occasionally while querying google bigquery job status
> Here is the error message from logs - 
> [2016-11-30 11:11:46,125] {bigquery_hook.py:454} INFO - Waiting for job to 
> complete: xx, job_xxx_
> [2016-11-30 11:11:51,131] {discovery.py:838} INFO - URL being requested: GET 
> https://www.googleapis.com/bigquery/v2/projects//jobs/job_xx?alt=json
> [2016-11-30 11:12:55,170] {models.py:1321} ERROR -  requesting 
> https://www.googleapis.com/bigquery/v2/projects//jobs/job_?alt=json
>  returned "Error encountered during execution. Retrying may solve the 
> problem.">
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/airflow/models.py", line 1280, in run
> result = task_copy.execute(context=context)
>   File 
> "/usr/lib/python2.7/site-packages/airflow/contrib/operators/gcs_to_bq.py", 
> line 129, in execute
> num_retries=self.num_retries)
>   File 
> "/usr/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py", 
> line 403, in run_load
> return self.run_with_configuration(configuration)
>   File 
> "/usr/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py", 
> line 456, in run_with_configuration
> job = jobs.get(projectId=self.project_id, jobId=job_id).execute()
>   File "/usr/lib/python2.7/site-packages/oauth2client/util.py", line 135, in 
> positional_wrapper
> return wrapped(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/googleapiclient/http.py", line 838, 
> in execute
> raise HttpError(resp, content, uri=self.uri)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-667) Handle 503 errors on google bigquery_hook

2016-12-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749477#comment-15749477
 ] 

ASF subversion and git services commented on AIRFLOW-667:
-

Commit cac133001b517dd7d66a31288dd375eb63d8cd26 in incubator-airflow's branch 
refs/heads/master from [~kmettu]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=cac1330 ]

[AIRFLOW-667] Handle BigQuery 503 error

Closes #1938 from krmettu/master


> Handle 503 errors on google bigquery_hook 
> --
>
> Key: AIRFLOW-667
> URL: https://issues.apache.org/jira/browse/AIRFLOW-667
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp
>Affects Versions: Airflow 2.0, Airflow 1.7.1, Airflow 1.7.0
>Reporter: Krishnaveni Mettu
>Assignee: Krishnaveni Mettu
>Priority: Minor
>
> Getting 503 errors occasionally while querying google bigquery job status
> Here is the error message from logs - 
> [2016-11-30 11:11:46,125] {bigquery_hook.py:454} INFO - Waiting for job to 
> complete: xx, job_xxx_
> [2016-11-30 11:11:51,131] {discovery.py:838} INFO - URL being requested: GET 
> https://www.googleapis.com/bigquery/v2/projects//jobs/job_xx?alt=json
> [2016-11-30 11:12:55,170] {models.py:1321} ERROR -  requesting 
> https://www.googleapis.com/bigquery/v2/projects//jobs/job_?alt=json
>  returned "Error encountered during execution. Retrying may solve the 
> problem.">
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/airflow/models.py", line 1280, in run
> result = task_copy.execute(context=context)
>   File 
> "/usr/lib/python2.7/site-packages/airflow/contrib/operators/gcs_to_bq.py", 
> line 129, in execute
> num_retries=self.num_retries)
>   File 
> "/usr/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py", 
> line 403, in run_load
> return self.run_with_configuration(configuration)
>   File 
> "/usr/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py", 
> line 456, in run_with_configuration
> job = jobs.get(projectId=self.project_id, jobId=job_id).execute()
>   File "/usr/lib/python2.7/site-packages/oauth2client/util.py", line 135, in 
> positional_wrapper
> return wrapped(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/googleapiclient/http.py", line 838, 
> in execute
> raise HttpError(resp, content, uri=self.uri)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


incubator-airflow git commit: [AIRFLOW-667] Handle BigQuery 503 error

2016-12-14 Thread criccomini
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 67ab416db -> cac133001


[AIRFLOW-667] Handle BigQuery 503 error

Closes #1938 from krmettu/master


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/cac13300
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/cac13300
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/cac13300

Branch: refs/heads/master
Commit: cac133001b517dd7d66a31288dd375eb63d8cd26
Parents: 67ab416
Author: Krishnaveni Mettu 
Authored: Wed Dec 14 13:02:48 2016 -0800
Committer: Chris Riccomini 
Committed: Wed Dec 14 13:02:56 2016 -0800

--
 airflow/contrib/hooks/bigquery_hook.py | 37 +++--
 1 file changed, 24 insertions(+), 13 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/cac13300/airflow/contrib/hooks/bigquery_hook.py
--
diff --git a/airflow/contrib/hooks/bigquery_hook.py 
b/airflow/contrib/hooks/bigquery_hook.py
index a0cb71d..900ec12 100644
--- a/airflow/contrib/hooks/bigquery_hook.py
+++ b/airflow/contrib/hooks/bigquery_hook.py
@@ -470,21 +470,32 @@ class BigQueryBaseCursor(object):
 .insert(projectId=self.project_id, body=job_data) \
 .execute()
 job_id = query_reply['jobReference']['jobId']
-job = jobs.get(projectId=self.project_id, jobId=job_id).execute()
 
 # Wait for query to finish.
-while not job['status']['state'] == 'DONE':
-logging.info('Waiting for job to complete: %s, %s', 
self.project_id, job_id)
-time.sleep(5)
-job = jobs.get(projectId=self.project_id, jobId=job_id).execute()
-
-# Check if job had errors.
-if 'errorResult' in job['status']:
-raise Exception(
-'BigQuery job failed. Final error was: {}. The job was: 
{}'.format(
-job['status']['errorResult'], job
-)
-)
+keep_polling_job = True
+while (keep_polling_job):
+try:
+job = jobs.get(projectId=self.project_id, 
jobId=job_id).execute()
+if (job['status']['state'] == 'DONE'):
+keep_polling_job = False
+# Check if job had errors.
+if 'errorResult' in job['status']:
+raise Exception(
+'BigQuery job failed. Final error was: {}. The job 
was: {}'.format(
+job['status']['errorResult'], job
+)
+)
+else:
+logging.info('Waiting for job to complete : %s, %s', 
self.project_id, job_id)
+time.sleep(5)
+
+except HttpError, err:
+if err.code in [500, 503]:
+logging.info('%s: Retryable error, waiting for job to 
complete: %s', err.code, job_id)
+time.sleep(5)
+else:
+raise Exception(
+'BigQuery job status check faild. Final error was: %s', 
err.code)
 
 return job_id
 



[jira] [Created] (AIRFLOW-701) Add Lemann Foundation as an Airflow user

2016-12-14 Thread Fernando Paiva (JIRA)
Fernando Paiva created AIRFLOW-701:
--

 Summary: Add Lemann Foundation as an Airflow user
 Key: AIRFLOW-701
 URL: https://issues.apache.org/jira/browse/AIRFLOW-701
 Project: Apache Airflow
  Issue Type: Task
Reporter: Fernando Paiva
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (AIRFLOW-700) Default config link to web authentication is out of date

2016-12-14 Thread Alan Cruickshank (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Cruickshank resolved AIRFLOW-700.
--
Resolution: Implemented

Pull request pending

https://github.com/apache/incubator-airflow/pull/1943

> Default config link to web authentication is out of date
> 
>
> Key: AIRFLOW-700
> URL: https://issues.apache.org/jira/browse/AIRFLOW-700
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core, Documentation
>Affects Versions: Airflow 2.0, Airflow 1.8, Airflow 1.7.1
>Reporter: Alan Cruickshank
>Assignee: Alan Cruickshank
>Priority: Trivial
>  Labels: newbie
> Fix For: Airflow 2.0
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> In airflow/configuration.py, the default config file has an old url for the 
> web authentication part of the documentation. The current link is:
> http://pythonhosted.org/airflow/installation.html#web-authentication
> which should be changed to
> http://pythonhosted.org/airflow/security.html#web-authentication



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-700) Default config link to web authentication is out of date

2016-12-14 Thread Alan Cruickshank (JIRA)
Alan Cruickshank created AIRFLOW-700:


 Summary: Default config link to web authentication is out of date
 Key: AIRFLOW-700
 URL: https://issues.apache.org/jira/browse/AIRFLOW-700
 Project: Apache Airflow
  Issue Type: Bug
  Components: core, Documentation
Affects Versions: Airflow 2.0, Airflow 1.8, Airflow 1.7.1
Reporter: Alan Cruickshank
Assignee: Alan Cruickshank
Priority: Trivial
 Fix For: Airflow 2.0


In airflow/configuration.py, the default config file has an old url for the web 
authentication part of the documentation. The current link is:

http://pythonhosted.org/airflow/installation.html#web-authentication

which should be changed to

http://pythonhosted.org/airflow/security.html#web-authentication



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-699) Dag can't be triggered at the same second due to constraint on dag_id + execution_date

2016-12-14 Thread Olivier Girardot (JIRA)
Olivier Girardot created AIRFLOW-699:


 Summary: Dag can't be triggered at the same second due to 
constraint on dag_id + execution_date
 Key: AIRFLOW-699
 URL: https://issues.apache.org/jira/browse/AIRFLOW-699
 Project: Apache Airflow
  Issue Type: Bug
  Components: db
Affects Versions: Airflow 1.7.1.3
Reporter: Olivier Girardot


We have a system that triggers Dags when several files arrive in HDFS, we have 
crafted a correct run_id to trace the trigger but since the schema of dag_run 
table is : 

{code:sql}
CREATE TABLE `dag_run` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `dag_id` varchar(250) DEFAULT NULL,
  `execution_date` datetime DEFAULT NULL,
  `state` varchar(50) DEFAULT NULL,
  `run_id` varchar(250) DEFAULT NULL,
  `external_trigger` tinyint(1) DEFAULT NULL,
  `conf` blob,
  `end_date` datetime DEFAULT NULL,
  `start_date` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `dag_id` (`dag_id`,`execution_date`),
  UNIQUE KEY `dag_id_2` (`dag_id`,`run_id`)
) ENGINE=InnoDB AUTO_INCREMENT=2998 DEFAULT CHARSET=latin1 {code}

We end up with DuplicateEntry exception
{noformat}
sqlalchemy.exc.IntegrityError: (_mysql_exceptions.IntegrityError) (1062, 
"Duplicate entry 'my-job-2016-12-13 19:52:33' for key 'dag_id'") [SQL: u'INSERT 
INTO dag_run (dag_id, execution_date, start_date, end_date, state, run_id, 
external_trigger, conf) VALUES (%s, %s, now(), %s, %s, %s, %s, %s)'] 
[parameters: ('my-job', datetime.datetime(2016, 12, 13, 19, 52, 33, 210790), 
None, u'running', 
'my-job-custom-run-id_2016-12-13T19:52:32.291_785a2860-c622-47f6-a29c-4c6394f931fa',
 1, "...")
{noformat}

Is there any need for this constraint ? The "datetime" precision is problematic 
for us because it's usual that some dags get triggered at the same second.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)