[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r238290290 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -473,11 +482,11 @@ def create_external_table(self, def run_query(self, bql=None, sql=None, - destination_dataset_table=False, + destination_dataset_table=None, write_disposition='WRITE_EMPTY', allow_large_results=False, - flatten_results=None, - udf_config=False, + flatten_results=False, Review comment: @draksler, very sad what is was not covered by tests in BigQueryOperator and it not matched docstrign, because udf_config it has a list type, not bool.. need to open the bug issue and make a fix This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r214749960 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -518,6 +528,13 @@ def run_query(self, :param use_legacy_sql: Whether to use legacy SQL (true) or standard SQL (false). If `None`, defaults to `self.use_legacy_sql`. :type use_legacy_sql: boolean +:param api_resource_configs: a dictionary that contain params +'configuration' applied for Google BigQuery Jobs API: +https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs +for example, {'query': {'useQueryCache': False}}. You could use it +if you need to provide some params what not supported by the BigQueryHook Review comment: @kaxil , done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r214549026 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -518,6 +528,11 @@ def run_query(self, :param use_legacy_sql: Whether to use legacy SQL (true) or standard SQL (false). If `None`, defaults to `self.use_legacy_sql`. :type use_legacy_sql: boolean +:param api_resource_configs: a dictionary that contain params Review comment: @kaxil check it This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r214549026 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -518,6 +528,11 @@ def run_query(self, :param use_legacy_sql: Whether to use legacy SQL (true) or standard SQL (false). If `None`, defaults to `self.use_legacy_sql`. :type use_legacy_sql: boolean +:param api_resource_configs: a dictionary that contain params Review comment: @kaxil check it, pls This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r214523981 ## File path: airflow/contrib/operators/bigquery_operator.py ## @@ -118,7 +118,8 @@ def __init__(self, query_params=None, labels=None, priority='INTERACTIVE', - time_partitioning={}, + time_partitioning=None, + api_resource_configs=None, Review comment: @kaxil done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213896240 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -566,95 +612,108 @@ def run_query(self, 'Airflow.', category=DeprecationWarning) -if sql is None: -raise TypeError('`BigQueryBaseCursor.run_query` missing 1 required ' -'positional argument: `sql`') +if not sql and not configuration['query'].get('query', None): +raise TypeError('`BigQueryBaseCursor.run_query` ' +'missing 1 required positional argument: `sql`') + +# BigQuery also allows you to define how you want a table's schema +# to change as a side effect of a query job for more details: +# https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.schemaUpdateOptions -# BigQuery also allows you to define how you want a table's schema to change -# as a side effect of a query job -# for more details: -# https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.schemaUpdateOptions allowed_schema_update_options = [ 'ALLOW_FIELD_ADDITION', "ALLOW_FIELD_RELAXATION" ] -if not set(allowed_schema_update_options).issuperset( -set(schema_update_options)): -raise ValueError( -"{0} contains invalid schema update options. " -"Please only use one or more of the following options: {1}" -.format(schema_update_options, allowed_schema_update_options)) -if use_legacy_sql is None: -use_legacy_sql = self.use_legacy_sql +if not set(allowed_schema_update_options + ).issuperset(set(schema_update_options)): +raise ValueError("{0} contains invalid schema update options. " + "Please only use one or more of the following " + "options: {1}" + .format(schema_update_options, + allowed_schema_update_options)) -configuration = { -'query': { -'query': sql, -'useLegacySql': use_legacy_sql, -'maximumBillingTier': maximum_billing_tier, -'maximumBytesBilled': maximum_bytes_billed, -'priority': priority -} -} +if schema_update_options: +if write_disposition not in ["WRITE_APPEND", "WRITE_TRUNCATE"]: +raise ValueError("schema_update_options is only " + "allowed if write_disposition is " + "'WRITE_APPEND' or 'WRITE_TRUNCATE'.") if destination_dataset_table: -if '.' not in destination_dataset_table: -raise ValueError( -'Expected destination_dataset_table name in the format of ' -'.. Got: {}'.format( -destination_dataset_table)) destination_project, destination_dataset, destination_table = \ _split_tablename(table_input=destination_dataset_table, default_project_id=self.project_id) -configuration['query'].update({ -'allowLargeResults': allow_large_results, -'flattenResults': flatten_results, -'writeDisposition': write_disposition, -'createDisposition': create_disposition, -'destinationTable': { -'projectId': destination_project, -'datasetId': destination_dataset, -'tableId': destination_table, -} -}) -if udf_config: -if not isinstance(udf_config, list): -raise TypeError("udf_config argument must have a type 'list'" -" not {}".format(type(udf_config))) -configuration['query'].update({ -'userDefinedFunctionResources': udf_config -}) -if query_params: -if self.use_legacy_sql: -raise ValueError("Query parameters are not allowed when using " - "legacy SQL") -else: -configuration['query']['queryParameters'] = query_params +destination_dataset_table = { +'projectId': destination_project, +'datasetId': destination_dataset, +'tableId': destination_table, +} -if labels: -configuration['labels'] = labels +query_param_list = [ +(sql,
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213894129 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -566,95 +612,108 @@ def run_query(self, 'Airflow.', category=DeprecationWarning) -if sql is None: -raise TypeError('`BigQueryBaseCursor.run_query` missing 1 required ' -'positional argument: `sql`') +if not sql and not configuration['query'].get('query', None): +raise TypeError('`BigQueryBaseCursor.run_query` ' +'missing 1 required positional argument: `sql`') + +# BigQuery also allows you to define how you want a table's schema +# to change as a side effect of a query job for more details: +# https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.schemaUpdateOptions -# BigQuery also allows you to define how you want a table's schema to change -# as a side effect of a query job -# for more details: -# https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.schemaUpdateOptions allowed_schema_update_options = [ 'ALLOW_FIELD_ADDITION', "ALLOW_FIELD_RELAXATION" ] -if not set(allowed_schema_update_options).issuperset( -set(schema_update_options)): -raise ValueError( -"{0} contains invalid schema update options. " -"Please only use one or more of the following options: {1}" -.format(schema_update_options, allowed_schema_update_options)) -if use_legacy_sql is None: -use_legacy_sql = self.use_legacy_sql +if not set(allowed_schema_update_options + ).issuperset(set(schema_update_options)): +raise ValueError("{0} contains invalid schema update options. " + "Please only use one or more of the following " + "options: {1}" + .format(schema_update_options, + allowed_schema_update_options)) -configuration = { -'query': { -'query': sql, -'useLegacySql': use_legacy_sql, -'maximumBillingTier': maximum_billing_tier, -'maximumBytesBilled': maximum_bytes_billed, -'priority': priority -} -} +if schema_update_options: +if write_disposition not in ["WRITE_APPEND", "WRITE_TRUNCATE"]: +raise ValueError("schema_update_options is only " + "allowed if write_disposition is " + "'WRITE_APPEND' or 'WRITE_TRUNCATE'.") if destination_dataset_table: -if '.' not in destination_dataset_table: -raise ValueError( -'Expected destination_dataset_table name in the format of ' -'.. Got: {}'.format( -destination_dataset_table)) destination_project, destination_dataset, destination_table = \ _split_tablename(table_input=destination_dataset_table, default_project_id=self.project_id) -configuration['query'].update({ -'allowLargeResults': allow_large_results, -'flattenResults': flatten_results, -'writeDisposition': write_disposition, -'createDisposition': create_disposition, -'destinationTable': { -'projectId': destination_project, -'datasetId': destination_dataset, -'tableId': destination_table, -} -}) -if udf_config: -if not isinstance(udf_config, list): -raise TypeError("udf_config argument must have a type 'list'" -" not {}".format(type(udf_config))) -configuration['query'].update({ -'userDefinedFunctionResources': udf_config -}) -if query_params: -if self.use_legacy_sql: -raise ValueError("Query parameters are not allowed when using " - "legacy SQL") -else: -configuration['query']['queryParameters'] = query_params +destination_dataset_table = { +'projectId': destination_project, +'datasetId': destination_dataset, +'tableId': destination_table, +} -if labels: -configuration['labels'] = labels +query_param_list = [ +(sql,
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213858150 ## File path: airflow/contrib/operators/bigquery_operator.py ## @@ -64,8 +64,9 @@ class BigQueryOperator(BaseOperator): :param udf_config: The User Defined Function configuration for the query. See https://cloud.google.com/bigquery/user-defined-functions for details. :type udf_config: list -:param use_legacy_sql: Whether to use legacy SQL (true) or standard SQL (false). -:type use_legacy_sql: boolean +:param use_legacy_sql: Whether to use legacy SQL (true) or +standard SQL (false). +:type use_legacy_sql: booleanq Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213896240 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -566,95 +612,108 @@ def run_query(self, 'Airflow.', category=DeprecationWarning) -if sql is None: -raise TypeError('`BigQueryBaseCursor.run_query` missing 1 required ' -'positional argument: `sql`') +if not sql and not configuration['query'].get('query', None): +raise TypeError('`BigQueryBaseCursor.run_query` ' +'missing 1 required positional argument: `sql`') + +# BigQuery also allows you to define how you want a table's schema +# to change as a side effect of a query job for more details: +# https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.schemaUpdateOptions -# BigQuery also allows you to define how you want a table's schema to change -# as a side effect of a query job -# for more details: -# https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.schemaUpdateOptions allowed_schema_update_options = [ 'ALLOW_FIELD_ADDITION', "ALLOW_FIELD_RELAXATION" ] -if not set(allowed_schema_update_options).issuperset( -set(schema_update_options)): -raise ValueError( -"{0} contains invalid schema update options. " -"Please only use one or more of the following options: {1}" -.format(schema_update_options, allowed_schema_update_options)) -if use_legacy_sql is None: -use_legacy_sql = self.use_legacy_sql +if not set(allowed_schema_update_options + ).issuperset(set(schema_update_options)): +raise ValueError("{0} contains invalid schema update options. " + "Please only use one or more of the following " + "options: {1}" + .format(schema_update_options, + allowed_schema_update_options)) -configuration = { -'query': { -'query': sql, -'useLegacySql': use_legacy_sql, -'maximumBillingTier': maximum_billing_tier, -'maximumBytesBilled': maximum_bytes_billed, -'priority': priority -} -} +if schema_update_options: +if write_disposition not in ["WRITE_APPEND", "WRITE_TRUNCATE"]: +raise ValueError("schema_update_options is only " + "allowed if write_disposition is " + "'WRITE_APPEND' or 'WRITE_TRUNCATE'.") if destination_dataset_table: -if '.' not in destination_dataset_table: -raise ValueError( -'Expected destination_dataset_table name in the format of ' -'.. Got: {}'.format( -destination_dataset_table)) destination_project, destination_dataset, destination_table = \ _split_tablename(table_input=destination_dataset_table, default_project_id=self.project_id) -configuration['query'].update({ -'allowLargeResults': allow_large_results, -'flattenResults': flatten_results, -'writeDisposition': write_disposition, -'createDisposition': create_disposition, -'destinationTable': { -'projectId': destination_project, -'datasetId': destination_dataset, -'tableId': destination_table, -} -}) -if udf_config: -if not isinstance(udf_config, list): -raise TypeError("udf_config argument must have a type 'list'" -" not {}".format(type(udf_config))) -configuration['query'].update({ -'userDefinedFunctionResources': udf_config -}) -if query_params: -if self.use_legacy_sql: -raise ValueError("Query parameters are not allowed when using " - "legacy SQL") -else: -configuration['query']['queryParameters'] = query_params +destination_dataset_table = { +'projectId': destination_project, +'datasetId': destination_dataset, +'tableId': destination_table, +} -if labels: -configuration['labels'] = labels +query_param_list = [ +(sql,
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213896015 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -1114,18 +1184,18 @@ def get_schema(self, dataset_id, table_id): :param table_id: the table ID of the requested table :return: a table schema """ -tables_resource = self.service.tables() \ -.get(projectId=self.project_id, datasetId=dataset_id, tableId=table_id) \ -.execute() +tables_resource = self.service.tables().get( +projectId=self.project_id, datasetId=dataset_id, +tableId=table_id).execute() return tables_resource['schema'] def get_tabledata(self, dataset_id, table_id, max_results=None, selected_fields=None, page_token=None, start_index=None): """ -Get the data of a given dataset.table and optionally with selected columns. -see https://cloud.google.com/bigquery/docs/reference/v2/tabledata/list - +Get the data of a given dataset.table and optionally +with selected columns. see: +https://cloud.google.com/bigquery/docs/reference/v2/tabledata/list Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213894129 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -566,95 +612,108 @@ def run_query(self, 'Airflow.', category=DeprecationWarning) -if sql is None: -raise TypeError('`BigQueryBaseCursor.run_query` missing 1 required ' -'positional argument: `sql`') +if not sql and not configuration['query'].get('query', None): +raise TypeError('`BigQueryBaseCursor.run_query` ' +'missing 1 required positional argument: `sql`') + +# BigQuery also allows you to define how you want a table's schema +# to change as a side effect of a query job for more details: +# https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.schemaUpdateOptions -# BigQuery also allows you to define how you want a table's schema to change -# as a side effect of a query job -# for more details: -# https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.schemaUpdateOptions allowed_schema_update_options = [ 'ALLOW_FIELD_ADDITION', "ALLOW_FIELD_RELAXATION" ] -if not set(allowed_schema_update_options).issuperset( -set(schema_update_options)): -raise ValueError( -"{0} contains invalid schema update options. " -"Please only use one or more of the following options: {1}" -.format(schema_update_options, allowed_schema_update_options)) -if use_legacy_sql is None: -use_legacy_sql = self.use_legacy_sql +if not set(allowed_schema_update_options + ).issuperset(set(schema_update_options)): +raise ValueError("{0} contains invalid schema update options. " + "Please only use one or more of the following " + "options: {1}" + .format(schema_update_options, + allowed_schema_update_options)) -configuration = { -'query': { -'query': sql, -'useLegacySql': use_legacy_sql, -'maximumBillingTier': maximum_billing_tier, -'maximumBytesBilled': maximum_bytes_billed, -'priority': priority -} -} +if schema_update_options: +if write_disposition not in ["WRITE_APPEND", "WRITE_TRUNCATE"]: +raise ValueError("schema_update_options is only " + "allowed if write_disposition is " + "'WRITE_APPEND' or 'WRITE_TRUNCATE'.") if destination_dataset_table: -if '.' not in destination_dataset_table: -raise ValueError( -'Expected destination_dataset_table name in the format of ' -'.. Got: {}'.format( -destination_dataset_table)) destination_project, destination_dataset, destination_table = \ _split_tablename(table_input=destination_dataset_table, default_project_id=self.project_id) -configuration['query'].update({ -'allowLargeResults': allow_large_results, -'flattenResults': flatten_results, -'writeDisposition': write_disposition, -'createDisposition': create_disposition, -'destinationTable': { -'projectId': destination_project, -'datasetId': destination_dataset, -'tableId': destination_table, -} -}) -if udf_config: -if not isinstance(udf_config, list): -raise TypeError("udf_config argument must have a type 'list'" -" not {}".format(type(udf_config))) -configuration['query'].update({ -'userDefinedFunctionResources': udf_config -}) -if query_params: -if self.use_legacy_sql: -raise ValueError("Query parameters are not allowed when using " - "legacy SQL") -else: -configuration['query']['queryParameters'] = query_params +destination_dataset_table = { +'projectId': destination_project, +'datasetId': destination_dataset, +'tableId': destination_table, +} -if labels: -configuration['labels'] = labels +query_param_list = [ +(sql,
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213894206 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -1114,18 +1184,18 @@ def get_schema(self, dataset_id, table_id): :param table_id: the table ID of the requested table :return: a table schema """ -tables_resource = self.service.tables() \ -.get(projectId=self.project_id, datasetId=dataset_id, tableId=table_id) \ Review comment: just saw what, max-line-length = 90 in Airflow. removed changes with line 79 I will be more clever next time This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213895073 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -220,26 +229,29 @@ def create_empty_table(self, :type table_id: str :param schema_fields: If set, the schema field list as defined here: https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load.schema -:param labels: a dictionary containing labels for the table, passed to BigQuery +:param labels: a dictionary containing labels for the table, +passed to BigQuery :type labels: dict **Example**: :: -schema_fields=[{"name": "emp_name", "type": "STRING", "mode": "REQUIRED"}, - {"name": "salary", "type": "INTEGER", "mode": "NULLABLE"}] +schema_fields=[ +{"name": "emp_name", "type": "STRING", "mode": "REQUIRED"}, +{"name": "salary", "type": "INTEGER", "mode": "NULLABLE"} +] :type schema_fields: list -:param time_partitioning: configure optional time partitioning fields i.e. -partition by field, type and expiration as per API specifications. +:param time_partitioning: configure optional time partitioning +fields i.e. partition by field, type and expiration +as per API specifications. .. seealso:: https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#timePartitioning :type time_partitioning: dict :return: """ -if time_partitioning is None: Review comment: logic in DRY, it was placed several times in code, and only one place where was a critical type of local var time_partitioning - in function _cleanse_time_partitioning, so I moved to check inside function, only to avoiding duplicate this check This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213858150 ## File path: airflow/contrib/operators/bigquery_operator.py ## @@ -64,8 +64,9 @@ class BigQueryOperator(BaseOperator): :param udf_config: The User Defined Function configuration for the query. See https://cloud.google.com/bigquery/user-defined-functions for details. :type udf_config: list -:param use_legacy_sql: Whether to use legacy SQL (true) or standard SQL (false). -:type use_legacy_sql: boolean +:param use_legacy_sql: Whether to use legacy SQL (true) or +standard SQL (false). +:type use_legacy_sql: booleanq Review comment: Yeah, I will change it. I broke my mind with error in test with bql deprecated warning. It ok in python 3.5, but error with python 2.7.. but it’s not errored before, not failed on master branch, but I did not touch warning or test... I try to fix it several days, must fun the fact, what in my local env this test always failed with python 2.7 (master too)... panic This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213858150 ## File path: airflow/contrib/operators/bigquery_operator.py ## @@ -64,8 +64,9 @@ class BigQueryOperator(BaseOperator): :param udf_config: The User Defined Function configuration for the query. See https://cloud.google.com/bigquery/user-defined-functions for details. :type udf_config: list -:param use_legacy_sql: Whether to use legacy SQL (true) or standard SQL (false). -:type use_legacy_sql: boolean +:param use_legacy_sql: Whether to use legacy SQL (true) or +standard SQL (false). +:type use_legacy_sql: booleanq Review comment: Yeah, I will change it. I broke my mind with error in test with bql deprecated warning. It ok in python 3.5, but error with python 2.7.. but it’s not errored before, not failed on master branch, but I did not touch warning or test... I try to fix it several days, must fun the fact, what in my local env this test always failed with python 2.7 ... panic This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213858150 ## File path: airflow/contrib/operators/bigquery_operator.py ## @@ -64,8 +64,9 @@ class BigQueryOperator(BaseOperator): :param udf_config: The User Defined Function configuration for the query. See https://cloud.google.com/bigquery/user-defined-functions for details. :type udf_config: list -:param use_legacy_sql: Whether to use legacy SQL (true) or standard SQL (false). -:type use_legacy_sql: boolean +:param use_legacy_sql: Whether to use legacy SQL (true) or +standard SQL (false). +:type use_legacy_sql: booleanq Review comment: Yeah, I will change it. I broke my mind with error in test with bql deprecated warning. It ok in python 3.6, but error with python 2.7.. but it’s not errored before, not failed on master branch, but I did not touch warning or test... I try to fix it several days, must fun the fact, what in my local env this test always failed with python 2.7 ... panic This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r212533643 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -587,15 +603,22 @@ def run_query(self, if use_legacy_sql is None: use_legacy_sql = self.use_legacy_sql -configuration = { -'query': { +configuration = deepcopy(api_resource_configs) + +query_default = { 'query': sql, 'useLegacySql': use_legacy_sql, 'maximumBillingTier': maximum_billing_tier, 'maximumBytesBilled': maximum_bytes_billed, 'priority': priority } -} + +if not 'query' in configuration: +configuration['query'] = query_default +else: +for param in query_default: +if param not in configuration['query']: Review comment: I take apart it on two types - values what default by None and we raise the error as with sql. So priority will be on arg keyword, and only if it not exist - check api_resource and get key from it. Or you think need to raise error anyway if keys are duplicated? Or maybe just warning. Also, we can decide the priority of ways to passing args and if key provided with arg to method or operator - ignore it in api_ var and just write warning. 2-nd thing is keys what have default values like useLegacySql or priority. Because they are is not None by default we can not check what they were set by a user or not. But, I really don't see necessary for those default settings, because they default in google API and as I understand we do not need to provide them. More I look on this config in the run_query, more want to change globally logic of creating conf for API request, or refactor part with configuration generating or forget idea with api_ var and just add useQueryCache param )) too many ifs... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r212533643 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -587,15 +603,22 @@ def run_query(self, if use_legacy_sql is None: use_legacy_sql = self.use_legacy_sql -configuration = { -'query': { +configuration = deepcopy(api_resource_configs) + +query_default = { 'query': sql, 'useLegacySql': use_legacy_sql, 'maximumBillingTier': maximum_billing_tier, 'maximumBytesBilled': maximum_bytes_billed, 'priority': priority } -} + +if not 'query' in configuration: +configuration['query'] = query_default +else: +for param in query_default: +if param not in configuration['query']: Review comment: I take apart it on two types - values what default by None and we raise the error as with sql. So priority will be on arg keyword, and only if it not exist - check api_resource and get key from it. Or you think need to raise error anyway if keys are duplicated? Or maybe just warning. 2-nd thing is keys what have default values like useLegacySql or priority. Because they are is not None by default we can not check what they were set by a user or not. But, I really don't see necessary for those default settings, because they default in google API and as I understand we do not need to provide them. But more I look on this config in the run_query, more want to change globally logic, or refactor part with configuration generating or forget idea with api_ var and just add useQueryCache param )) too many ifs... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r212469186 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -656,6 +668,21 @@ def run_query(self, configuration['query'][ 'schemaUpdateOptions'] = schema_update_options +if not api_resource_configs: +api_resource_configs = self.api_resource_configs + +if 'configuration' in api_resource_configs and isinstance( Review comment: check pls This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r212466137 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -656,6 +668,21 @@ def run_query(self, configuration['query'][ 'schemaUpdateOptions'] = schema_update_options +if not api_resource_configs: +api_resource_configs = self.api_resource_configs + +if 'configuration' in api_resource_configs and isinstance( Review comment: Do you mean to do a copy of api_config in configuration at first and then add other params to configuration if they do not exist? Am I right understand? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r212466137 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -656,6 +668,21 @@ def run_query(self, configuration['query'][ 'schemaUpdateOptions'] = schema_update_options +if not api_resource_configs: +api_resource_configs = self.api_resource_configs + +if 'configuration' in api_resource_configs and isinstance( Review comment: Do you mean first do a copy of api_config in configuration and then add other params to configuration if they do not exist? Am I right understand? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r212444862 ## File path: tests/contrib/hooks/test_bigquery_hook.py ## @@ -281,6 +281,18 @@ def test_run_query_sql_dialect_override(self, run_with_config): args, kwargs = run_with_config.call_args self.assertIs(args[0]['query']['useLegacySql'], bool_val) +@mock.patch.object(hook.BigQueryBaseCursor, 'run_with_configuration') +def test_api_resource_configs(self, run_with_config): +for bool_val in [True, False]: +cursor = hook.BigQueryBaseCursor(mock.Mock(), "project_id") +cursor.run_query('query', + api_resource_configs={ + 'configuration': + {'query': {'useQueryCache': bool_val}}}) + +args, kwargs = run_with_config.call_args +self.assertIs(args[0]['query']['useQueryCache'], bool_val) Review comment: and added check that default values exist and were not delete This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r212443568 ## File path: tests/contrib/hooks/test_bigquery_hook.py ## @@ -281,6 +281,18 @@ def test_run_query_sql_dialect_override(self, run_with_config): args, kwargs = run_with_config.call_args self.assertIs(args[0]['query']['useLegacySql'], bool_val) +@mock.patch.object(hook.BigQueryBaseCursor, 'run_with_configuration') +def test_api_resource_configs(self, run_with_config): +for bool_val in [True, False]: +cursor = hook.BigQueryBaseCursor(mock.Mock(), "project_id") +cursor.run_query('query', + api_resource_configs={ + 'configuration': + {'query': {'useQueryCache': bool_val}}}) + +args, kwargs = run_with_config.call_args +self.assertIs(args[0]['query']['useQueryCache'], bool_val) Review comment: resolved with changing the logic in upper comment This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r211637030 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -656,6 +671,10 @@ def run_query(self, configuration['query'][ 'schemaUpdateOptions'] = schema_update_options +if 'configuration' in api_resource_configs: +for key in api_resource_configs['configuration']: +configuration[key] = api_resource_configs['configuration'][key] Review comment: @tswast , check it pls This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r211634204 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -656,6 +671,10 @@ def run_query(self, configuration['query'][ 'schemaUpdateOptions'] = schema_update_options +if 'configuration' in api_resource_configs: Review comment: I thought about this param like a possibility to set any settings in one dict, that supported by API (because of 'src_fmt_configs'). But if we say what it's only the way to set 'configuration' and we will not available this way to send any other params, for example, 'jobReference' or 'statistics', so it's not needed. What do you think? Use it only for 'configuration' ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r211634204 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -656,6 +671,10 @@ def run_query(self, configuration['query'][ 'schemaUpdateOptions'] = schema_update_options +if 'configuration' in api_resource_configs: Review comment: I thought about this param like a possibility to set any settings in one dict, that supported by API (because of 'src_fmt_configs'). For example, 'jobReference' or 'statistics': https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs. But if we say what it's only the way to set 'configuration' so it's not needed. What do you think? Use it only for 'configuration' ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r211634204 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -656,6 +671,10 @@ def run_query(self, configuration['query'][ 'schemaUpdateOptions'] = schema_update_options +if 'configuration' in api_resource_configs: Review comment: I thought about this param like a possibility to set any settings in one dict, that supported by API (because of 'src_fmt_configs'). But if we say what it's only the way to set 'configuration' and we will not available this way to send any other params, for example, 'jobReference' or 'statistics' .. so way what I could define one dict with all params and send it. So if it's not needed and it will be only for 'configuration' - it's of course not needed. What do you think? Use it only for 'configuration' ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r211629993 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -656,6 +671,10 @@ def run_query(self, configuration['query'][ 'schemaUpdateOptions'] = schema_update_options +if 'configuration' in api_resource_configs: +for key in api_resource_configs['configuration']: +configuration[key] = api_resource_configs['configuration'][key] Review comment: @tswast, I got it, sorry, you right This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r211625138 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -656,6 +671,10 @@ def run_query(self, configuration['query'][ 'schemaUpdateOptions'] = schema_update_options +if 'configuration' in api_resource_configs: +for key in api_resource_configs['configuration']: +configuration[key] = api_resource_configs['configuration'][key] Review comment: mm.. no, it will re-write only keys what will be added by the user in api_resource_configs, or I didn't catch what you mean. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services