[jira] [Closed] (AIRFLOW-667) Handle 503 errors on google bigquery_hook
[ https://issues.apache.org/jira/browse/AIRFLOW-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini closed AIRFLOW-667. --- Resolution: Fixed Fix Version/s: Airflow 1.8 > Handle 503 errors on google bigquery_hook > -- > > Key: AIRFLOW-667 > URL: https://issues.apache.org/jira/browse/AIRFLOW-667 > Project: Apache Airflow > Issue Type: Improvement > Components: gcp >Affects Versions: Airflow 2.0, Airflow 1.7.1, Airflow 1.7.0 >Reporter: Krishnaveni Mettu >Assignee: Krishnaveni Mettu >Priority: Minor > Fix For: Airflow 1.8 > > > Getting 503 errors occasionally while querying google bigquery job status > Here is the error message from logs - > [2016-11-30 11:11:46,125] {bigquery_hook.py:454} INFO - Waiting for job to > complete: xx, job_xxx_ > [2016-11-30 11:11:51,131] {discovery.py:838} INFO - URL being requested: GET > https://www.googleapis.com/bigquery/v2/projects//jobs/job_xx?alt=json > [2016-11-30 11:12:55,170] {models.py:1321} ERROR - requesting > https://www.googleapis.com/bigquery/v2/projects//jobs/job_?alt=json > returned "Error encountered during execution. Retrying may solve the > problem."> > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/airflow/models.py", line 1280, in run > result = task_copy.execute(context=context) > File > "/usr/lib/python2.7/site-packages/airflow/contrib/operators/gcs_to_bq.py", > line 129, in execute > num_retries=self.num_retries) > File > "/usr/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py", > line 403, in run_load > return self.run_with_configuration(configuration) > File > "/usr/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py", > line 456, in run_with_configuration > job = jobs.get(projectId=self.project_id, jobId=job_id).execute() > File "/usr/lib/python2.7/site-packages/oauth2client/util.py", line 135, in > positional_wrapper > return wrapped(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/googleapiclient/http.py", line 838, > in execute > raise HttpError(resp, content, uri=self.uri) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-667) Handle 503 errors on google bigquery_hook
[ https://issues.apache.org/jira/browse/AIRFLOW-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749477#comment-15749477 ] ASF subversion and git services commented on AIRFLOW-667: - Commit cac133001b517dd7d66a31288dd375eb63d8cd26 in incubator-airflow's branch refs/heads/master from [~kmettu] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=cac1330 ] [AIRFLOW-667] Handle BigQuery 503 error Closes #1938 from krmettu/master > Handle 503 errors on google bigquery_hook > -- > > Key: AIRFLOW-667 > URL: https://issues.apache.org/jira/browse/AIRFLOW-667 > Project: Apache Airflow > Issue Type: Improvement > Components: gcp >Affects Versions: Airflow 2.0, Airflow 1.7.1, Airflow 1.7.0 >Reporter: Krishnaveni Mettu >Assignee: Krishnaveni Mettu >Priority: Minor > > Getting 503 errors occasionally while querying google bigquery job status > Here is the error message from logs - > [2016-11-30 11:11:46,125] {bigquery_hook.py:454} INFO - Waiting for job to > complete: xx, job_xxx_ > [2016-11-30 11:11:51,131] {discovery.py:838} INFO - URL being requested: GET > https://www.googleapis.com/bigquery/v2/projects//jobs/job_xx?alt=json > [2016-11-30 11:12:55,170] {models.py:1321} ERROR - requesting > https://www.googleapis.com/bigquery/v2/projects//jobs/job_?alt=json > returned "Error encountered during execution. Retrying may solve the > problem."> > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/airflow/models.py", line 1280, in run > result = task_copy.execute(context=context) > File > "/usr/lib/python2.7/site-packages/airflow/contrib/operators/gcs_to_bq.py", > line 129, in execute > num_retries=self.num_retries) > File > "/usr/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py", > line 403, in run_load > return self.run_with_configuration(configuration) > File > "/usr/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py", > line 456, in run_with_configuration > job = jobs.get(projectId=self.project_id, jobId=job_id).execute() > File "/usr/lib/python2.7/site-packages/oauth2client/util.py", line 135, in > positional_wrapper > return wrapped(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/googleapiclient/http.py", line 838, > in execute > raise HttpError(resp, content, uri=self.uri) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
incubator-airflow git commit: [AIRFLOW-667] Handle BigQuery 503 error
Repository: incubator-airflow Updated Branches: refs/heads/master 67ab416db -> cac133001 [AIRFLOW-667] Handle BigQuery 503 error Closes #1938 from krmettu/master Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/cac13300 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/cac13300 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/cac13300 Branch: refs/heads/master Commit: cac133001b517dd7d66a31288dd375eb63d8cd26 Parents: 67ab416 Author: Krishnaveni MettuAuthored: Wed Dec 14 13:02:48 2016 -0800 Committer: Chris Riccomini Committed: Wed Dec 14 13:02:56 2016 -0800 -- airflow/contrib/hooks/bigquery_hook.py | 37 +++-- 1 file changed, 24 insertions(+), 13 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/cac13300/airflow/contrib/hooks/bigquery_hook.py -- diff --git a/airflow/contrib/hooks/bigquery_hook.py b/airflow/contrib/hooks/bigquery_hook.py index a0cb71d..900ec12 100644 --- a/airflow/contrib/hooks/bigquery_hook.py +++ b/airflow/contrib/hooks/bigquery_hook.py @@ -470,21 +470,32 @@ class BigQueryBaseCursor(object): .insert(projectId=self.project_id, body=job_data) \ .execute() job_id = query_reply['jobReference']['jobId'] -job = jobs.get(projectId=self.project_id, jobId=job_id).execute() # Wait for query to finish. -while not job['status']['state'] == 'DONE': -logging.info('Waiting for job to complete: %s, %s', self.project_id, job_id) -time.sleep(5) -job = jobs.get(projectId=self.project_id, jobId=job_id).execute() - -# Check if job had errors. -if 'errorResult' in job['status']: -raise Exception( -'BigQuery job failed. Final error was: {}. The job was: {}'.format( -job['status']['errorResult'], job -) -) +keep_polling_job = True +while (keep_polling_job): +try: +job = jobs.get(projectId=self.project_id, jobId=job_id).execute() +if (job['status']['state'] == 'DONE'): +keep_polling_job = False +# Check if job had errors. +if 'errorResult' in job['status']: +raise Exception( +'BigQuery job failed. Final error was: {}. The job was: {}'.format( +job['status']['errorResult'], job +) +) +else: +logging.info('Waiting for job to complete : %s, %s', self.project_id, job_id) +time.sleep(5) + +except HttpError, err: +if err.code in [500, 503]: +logging.info('%s: Retryable error, waiting for job to complete: %s', err.code, job_id) +time.sleep(5) +else: +raise Exception( +'BigQuery job status check faild. Final error was: %s', err.code) return job_id
[jira] [Created] (AIRFLOW-701) Add Lemann Foundation as an Airflow user
Fernando Paiva created AIRFLOW-701: -- Summary: Add Lemann Foundation as an Airflow user Key: AIRFLOW-701 URL: https://issues.apache.org/jira/browse/AIRFLOW-701 Project: Apache Airflow Issue Type: Task Reporter: Fernando Paiva Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (AIRFLOW-700) Default config link to web authentication is out of date
[ https://issues.apache.org/jira/browse/AIRFLOW-700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Cruickshank resolved AIRFLOW-700. -- Resolution: Implemented Pull request pending https://github.com/apache/incubator-airflow/pull/1943 > Default config link to web authentication is out of date > > > Key: AIRFLOW-700 > URL: https://issues.apache.org/jira/browse/AIRFLOW-700 > Project: Apache Airflow > Issue Type: Bug > Components: core, Documentation >Affects Versions: Airflow 2.0, Airflow 1.8, Airflow 1.7.1 >Reporter: Alan Cruickshank >Assignee: Alan Cruickshank >Priority: Trivial > Labels: newbie > Fix For: Airflow 2.0 > > Original Estimate: 5m > Remaining Estimate: 5m > > In airflow/configuration.py, the default config file has an old url for the > web authentication part of the documentation. The current link is: > http://pythonhosted.org/airflow/installation.html#web-authentication > which should be changed to > http://pythonhosted.org/airflow/security.html#web-authentication -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-700) Default config link to web authentication is out of date
Alan Cruickshank created AIRFLOW-700: Summary: Default config link to web authentication is out of date Key: AIRFLOW-700 URL: https://issues.apache.org/jira/browse/AIRFLOW-700 Project: Apache Airflow Issue Type: Bug Components: core, Documentation Affects Versions: Airflow 2.0, Airflow 1.8, Airflow 1.7.1 Reporter: Alan Cruickshank Assignee: Alan Cruickshank Priority: Trivial Fix For: Airflow 2.0 In airflow/configuration.py, the default config file has an old url for the web authentication part of the documentation. The current link is: http://pythonhosted.org/airflow/installation.html#web-authentication which should be changed to http://pythonhosted.org/airflow/security.html#web-authentication -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-699) Dag can't be triggered at the same second due to constraint on dag_id + execution_date
Olivier Girardot created AIRFLOW-699: Summary: Dag can't be triggered at the same second due to constraint on dag_id + execution_date Key: AIRFLOW-699 URL: https://issues.apache.org/jira/browse/AIRFLOW-699 Project: Apache Airflow Issue Type: Bug Components: db Affects Versions: Airflow 1.7.1.3 Reporter: Olivier Girardot We have a system that triggers Dags when several files arrive in HDFS, we have crafted a correct run_id to trace the trigger but since the schema of dag_run table is : {code:sql} CREATE TABLE `dag_run` ( `id` int(11) NOT NULL AUTO_INCREMENT, `dag_id` varchar(250) DEFAULT NULL, `execution_date` datetime DEFAULT NULL, `state` varchar(50) DEFAULT NULL, `run_id` varchar(250) DEFAULT NULL, `external_trigger` tinyint(1) DEFAULT NULL, `conf` blob, `end_date` datetime DEFAULT NULL, `start_date` datetime DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `dag_id` (`dag_id`,`execution_date`), UNIQUE KEY `dag_id_2` (`dag_id`,`run_id`) ) ENGINE=InnoDB AUTO_INCREMENT=2998 DEFAULT CHARSET=latin1 {code} We end up with DuplicateEntry exception {noformat} sqlalchemy.exc.IntegrityError: (_mysql_exceptions.IntegrityError) (1062, "Duplicate entry 'my-job-2016-12-13 19:52:33' for key 'dag_id'") [SQL: u'INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, state, run_id, external_trigger, conf) VALUES (%s, %s, now(), %s, %s, %s, %s, %s)'] [parameters: ('my-job', datetime.datetime(2016, 12, 13, 19, 52, 33, 210790), None, u'running', 'my-job-custom-run-id_2016-12-13T19:52:32.291_785a2860-c622-47f6-a29c-4c6394f931fa', 1, "...") {noformat} Is there any need for this constraint ? The "datetime" precision is problematic for us because it's usual that some dags get triggered at the same second. -- This message was sent by Atlassian JIRA (v6.3.4#6332)