[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-12-03 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r238290290
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -473,11 +482,11 @@ def create_external_table(self,
 def run_query(self,
   bql=None,
   sql=None,
-  destination_dataset_table=False,
+  destination_dataset_table=None,
   write_disposition='WRITE_EMPTY',
   allow_large_results=False,
-  flatten_results=None,
-  udf_config=False,
+  flatten_results=False,
 
 Review comment:
   @draksler, very sad what is was not covered by tests in BigQueryOperator and 
it not matched docstrign, because udf_config it has a list type, not bool.. 
need to open the bug issue and make a fix


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-09-03 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r214749960
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -518,6 +528,13 @@ def run_query(self,
 :param use_legacy_sql: Whether to use legacy SQL (true) or standard 
SQL (false).
 If `None`, defaults to `self.use_legacy_sql`.
 :type use_legacy_sql: boolean
+:param api_resource_configs: a dictionary that contain params
+'configuration' applied for Google BigQuery Jobs API:
+https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs
+for example, {'query': {'useQueryCache': False}}. You could use it
+if you need to provide some params what not supported by the 
BigQueryHook
 
 Review comment:
   @kaxil , done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-09-02 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r214549026
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -518,6 +528,11 @@ def run_query(self,
 :param use_legacy_sql: Whether to use legacy SQL (true) or standard 
SQL (false).
 If `None`, defaults to `self.use_legacy_sql`.
 :type use_legacy_sql: boolean
+:param api_resource_configs: a dictionary that contain params
 
 Review comment:
   @kaxil check it


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-09-02 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r214549026
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -518,6 +528,11 @@ def run_query(self,
 :param use_legacy_sql: Whether to use legacy SQL (true) or standard 
SQL (false).
 If `None`, defaults to `self.use_legacy_sql`.
 :type use_legacy_sql: boolean
+:param api_resource_configs: a dictionary that contain params
 
 Review comment:
   @kaxil check it, pls


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-09-01 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r214523981
 
 

 ##
 File path: airflow/contrib/operators/bigquery_operator.py
 ##
 @@ -118,7 +118,8 @@ def __init__(self,
  query_params=None,
  labels=None,
  priority='INTERACTIVE',
- time_partitioning={},
+ time_partitioning=None,
+ api_resource_configs=None,
 
 Review comment:
   @kaxil done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-30 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213896240
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -566,95 +612,108 @@ def run_query(self,
   'Airflow.',
   category=DeprecationWarning)
 
-if sql is None:
-raise TypeError('`BigQueryBaseCursor.run_query` missing 1 required 
'
-'positional argument: `sql`')
+if not sql and not configuration['query'].get('query', None):
+raise TypeError('`BigQueryBaseCursor.run_query` '
+'missing 1 required positional argument: `sql`')
+
+# BigQuery also allows you to define how you want a table's schema
+# to change as a side effect of a query job for more details:
+# 
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.schemaUpdateOptions
 
-# BigQuery also allows you to define how you want a table's schema to 
change
-# as a side effect of a query job
-# for more details:
-#   
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.schemaUpdateOptions
 allowed_schema_update_options = [
 'ALLOW_FIELD_ADDITION', "ALLOW_FIELD_RELAXATION"
 ]
-if not set(allowed_schema_update_options).issuperset(
-set(schema_update_options)):
-raise ValueError(
-"{0} contains invalid schema update options. "
-"Please only use one or more of the following options: {1}"
-.format(schema_update_options, allowed_schema_update_options))
 
-if use_legacy_sql is None:
-use_legacy_sql = self.use_legacy_sql
+if not set(allowed_schema_update_options
+   ).issuperset(set(schema_update_options)):
+raise ValueError("{0} contains invalid schema update options. "
+ "Please only use one or more of the following "
+ "options: {1}"
+ .format(schema_update_options,
+ allowed_schema_update_options))
 
-configuration = {
-'query': {
-'query': sql,
-'useLegacySql': use_legacy_sql,
-'maximumBillingTier': maximum_billing_tier,
-'maximumBytesBilled': maximum_bytes_billed,
-'priority': priority
-}
-}
+if schema_update_options:
+if write_disposition not in ["WRITE_APPEND", "WRITE_TRUNCATE"]:
+raise ValueError("schema_update_options is only "
+ "allowed if write_disposition is "
+ "'WRITE_APPEND' or 'WRITE_TRUNCATE'.")
 
 if destination_dataset_table:
-if '.' not in destination_dataset_table:
-raise ValueError(
-'Expected destination_dataset_table name in the format of '
-'.. Got: {}'.format(
-destination_dataset_table))
 destination_project, destination_dataset, destination_table = \
 _split_tablename(table_input=destination_dataset_table,
  default_project_id=self.project_id)
-configuration['query'].update({
-'allowLargeResults': allow_large_results,
-'flattenResults': flatten_results,
-'writeDisposition': write_disposition,
-'createDisposition': create_disposition,
-'destinationTable': {
-'projectId': destination_project,
-'datasetId': destination_dataset,
-'tableId': destination_table,
-}
-})
-if udf_config:
-if not isinstance(udf_config, list):
-raise TypeError("udf_config argument must have a type 'list'"
-" not {}".format(type(udf_config)))
-configuration['query'].update({
-'userDefinedFunctionResources': udf_config
-})
 
-if query_params:
-if self.use_legacy_sql:
-raise ValueError("Query parameters are not allowed when using "
- "legacy SQL")
-else:
-configuration['query']['queryParameters'] = query_params
+destination_dataset_table = {
+'projectId': destination_project,
+'datasetId': destination_dataset,
+'tableId': destination_table,
+}
 
-if labels:
-configuration['labels'] = labels
+query_param_list = [
+(sql, 

[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-30 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213894129
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -566,95 +612,108 @@ def run_query(self,
   'Airflow.',
   category=DeprecationWarning)
 
-if sql is None:
-raise TypeError('`BigQueryBaseCursor.run_query` missing 1 required 
'
-'positional argument: `sql`')
+if not sql and not configuration['query'].get('query', None):
+raise TypeError('`BigQueryBaseCursor.run_query` '
+'missing 1 required positional argument: `sql`')
+
+# BigQuery also allows you to define how you want a table's schema
+# to change as a side effect of a query job for more details:
+# 
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.schemaUpdateOptions
 
-# BigQuery also allows you to define how you want a table's schema to 
change
-# as a side effect of a query job
-# for more details:
-#   
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.schemaUpdateOptions
 allowed_schema_update_options = [
 'ALLOW_FIELD_ADDITION', "ALLOW_FIELD_RELAXATION"
 ]
-if not set(allowed_schema_update_options).issuperset(
-set(schema_update_options)):
-raise ValueError(
-"{0} contains invalid schema update options. "
-"Please only use one or more of the following options: {1}"
-.format(schema_update_options, allowed_schema_update_options))
 
-if use_legacy_sql is None:
-use_legacy_sql = self.use_legacy_sql
+if not set(allowed_schema_update_options
+   ).issuperset(set(schema_update_options)):
+raise ValueError("{0} contains invalid schema update options. "
+ "Please only use one or more of the following "
+ "options: {1}"
+ .format(schema_update_options,
+ allowed_schema_update_options))
 
-configuration = {
-'query': {
-'query': sql,
-'useLegacySql': use_legacy_sql,
-'maximumBillingTier': maximum_billing_tier,
-'maximumBytesBilled': maximum_bytes_billed,
-'priority': priority
-}
-}
+if schema_update_options:
+if write_disposition not in ["WRITE_APPEND", "WRITE_TRUNCATE"]:
+raise ValueError("schema_update_options is only "
+ "allowed if write_disposition is "
+ "'WRITE_APPEND' or 'WRITE_TRUNCATE'.")
 
 if destination_dataset_table:
-if '.' not in destination_dataset_table:
-raise ValueError(
-'Expected destination_dataset_table name in the format of '
-'.. Got: {}'.format(
-destination_dataset_table))
 destination_project, destination_dataset, destination_table = \
 _split_tablename(table_input=destination_dataset_table,
  default_project_id=self.project_id)
-configuration['query'].update({
-'allowLargeResults': allow_large_results,
-'flattenResults': flatten_results,
-'writeDisposition': write_disposition,
-'createDisposition': create_disposition,
-'destinationTable': {
-'projectId': destination_project,
-'datasetId': destination_dataset,
-'tableId': destination_table,
-}
-})
-if udf_config:
-if not isinstance(udf_config, list):
-raise TypeError("udf_config argument must have a type 'list'"
-" not {}".format(type(udf_config)))
-configuration['query'].update({
-'userDefinedFunctionResources': udf_config
-})
 
-if query_params:
-if self.use_legacy_sql:
-raise ValueError("Query parameters are not allowed when using "
- "legacy SQL")
-else:
-configuration['query']['queryParameters'] = query_params
+destination_dataset_table = {
+'projectId': destination_project,
+'datasetId': destination_dataset,
+'tableId': destination_table,
+}
 
-if labels:
-configuration['labels'] = labels
+query_param_list = [
+(sql, 

[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-30 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213858150
 
 

 ##
 File path: airflow/contrib/operators/bigquery_operator.py
 ##
 @@ -64,8 +64,9 @@ class BigQueryOperator(BaseOperator):
 :param udf_config: The User Defined Function configuration for the query.
 See https://cloud.google.com/bigquery/user-defined-functions for 
details.
 :type udf_config: list
-:param use_legacy_sql: Whether to use legacy SQL (true) or standard SQL 
(false).
-:type use_legacy_sql: boolean
+:param use_legacy_sql: Whether to use legacy SQL (true) or
+standard SQL (false).
+:type use_legacy_sql: booleanq
 
 Review comment:
   fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-29 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213896240
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -566,95 +612,108 @@ def run_query(self,
   'Airflow.',
   category=DeprecationWarning)
 
-if sql is None:
-raise TypeError('`BigQueryBaseCursor.run_query` missing 1 required 
'
-'positional argument: `sql`')
+if not sql and not configuration['query'].get('query', None):
+raise TypeError('`BigQueryBaseCursor.run_query` '
+'missing 1 required positional argument: `sql`')
+
+# BigQuery also allows you to define how you want a table's schema
+# to change as a side effect of a query job for more details:
+# 
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.schemaUpdateOptions
 
-# BigQuery also allows you to define how you want a table's schema to 
change
-# as a side effect of a query job
-# for more details:
-#   
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.schemaUpdateOptions
 allowed_schema_update_options = [
 'ALLOW_FIELD_ADDITION', "ALLOW_FIELD_RELAXATION"
 ]
-if not set(allowed_schema_update_options).issuperset(
-set(schema_update_options)):
-raise ValueError(
-"{0} contains invalid schema update options. "
-"Please only use one or more of the following options: {1}"
-.format(schema_update_options, allowed_schema_update_options))
 
-if use_legacy_sql is None:
-use_legacy_sql = self.use_legacy_sql
+if not set(allowed_schema_update_options
+   ).issuperset(set(schema_update_options)):
+raise ValueError("{0} contains invalid schema update options. "
+ "Please only use one or more of the following "
+ "options: {1}"
+ .format(schema_update_options,
+ allowed_schema_update_options))
 
-configuration = {
-'query': {
-'query': sql,
-'useLegacySql': use_legacy_sql,
-'maximumBillingTier': maximum_billing_tier,
-'maximumBytesBilled': maximum_bytes_billed,
-'priority': priority
-}
-}
+if schema_update_options:
+if write_disposition not in ["WRITE_APPEND", "WRITE_TRUNCATE"]:
+raise ValueError("schema_update_options is only "
+ "allowed if write_disposition is "
+ "'WRITE_APPEND' or 'WRITE_TRUNCATE'.")
 
 if destination_dataset_table:
-if '.' not in destination_dataset_table:
-raise ValueError(
-'Expected destination_dataset_table name in the format of '
-'.. Got: {}'.format(
-destination_dataset_table))
 destination_project, destination_dataset, destination_table = \
 _split_tablename(table_input=destination_dataset_table,
  default_project_id=self.project_id)
-configuration['query'].update({
-'allowLargeResults': allow_large_results,
-'flattenResults': flatten_results,
-'writeDisposition': write_disposition,
-'createDisposition': create_disposition,
-'destinationTable': {
-'projectId': destination_project,
-'datasetId': destination_dataset,
-'tableId': destination_table,
-}
-})
-if udf_config:
-if not isinstance(udf_config, list):
-raise TypeError("udf_config argument must have a type 'list'"
-" not {}".format(type(udf_config)))
-configuration['query'].update({
-'userDefinedFunctionResources': udf_config
-})
 
-if query_params:
-if self.use_legacy_sql:
-raise ValueError("Query parameters are not allowed when using "
- "legacy SQL")
-else:
-configuration['query']['queryParameters'] = query_params
+destination_dataset_table = {
+'projectId': destination_project,
+'datasetId': destination_dataset,
+'tableId': destination_table,
+}
 
-if labels:
-configuration['labels'] = labels
+query_param_list = [
+(sql, 

[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-29 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213896015
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -1114,18 +1184,18 @@ def get_schema(self, dataset_id, table_id):
 :param table_id: the table ID of the requested table
 :return: a table schema
 """
-tables_resource = self.service.tables() \
-.get(projectId=self.project_id, datasetId=dataset_id, 
tableId=table_id) \
-.execute()
+tables_resource = self.service.tables().get(
+projectId=self.project_id, datasetId=dataset_id,
+tableId=table_id).execute()
 return tables_resource['schema']
 
 def get_tabledata(self, dataset_id, table_id,
   max_results=None, selected_fields=None, page_token=None,
   start_index=None):
 """
-Get the data of a given dataset.table and optionally with selected 
columns.
-see https://cloud.google.com/bigquery/docs/reference/v2/tabledata/list
-
+Get the data of a given dataset.table and optionally
+with selected columns. see:
+https://cloud.google.com/bigquery/docs/reference/v2/tabledata/list
 
 Review comment:
   fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-29 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213894129
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -566,95 +612,108 @@ def run_query(self,
   'Airflow.',
   category=DeprecationWarning)
 
-if sql is None:
-raise TypeError('`BigQueryBaseCursor.run_query` missing 1 required 
'
-'positional argument: `sql`')
+if not sql and not configuration['query'].get('query', None):
+raise TypeError('`BigQueryBaseCursor.run_query` '
+'missing 1 required positional argument: `sql`')
+
+# BigQuery also allows you to define how you want a table's schema
+# to change as a side effect of a query job for more details:
+# 
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.schemaUpdateOptions
 
-# BigQuery also allows you to define how you want a table's schema to 
change
-# as a side effect of a query job
-# for more details:
-#   
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.schemaUpdateOptions
 allowed_schema_update_options = [
 'ALLOW_FIELD_ADDITION', "ALLOW_FIELD_RELAXATION"
 ]
-if not set(allowed_schema_update_options).issuperset(
-set(schema_update_options)):
-raise ValueError(
-"{0} contains invalid schema update options. "
-"Please only use one or more of the following options: {1}"
-.format(schema_update_options, allowed_schema_update_options))
 
-if use_legacy_sql is None:
-use_legacy_sql = self.use_legacy_sql
+if not set(allowed_schema_update_options
+   ).issuperset(set(schema_update_options)):
+raise ValueError("{0} contains invalid schema update options. "
+ "Please only use one or more of the following "
+ "options: {1}"
+ .format(schema_update_options,
+ allowed_schema_update_options))
 
-configuration = {
-'query': {
-'query': sql,
-'useLegacySql': use_legacy_sql,
-'maximumBillingTier': maximum_billing_tier,
-'maximumBytesBilled': maximum_bytes_billed,
-'priority': priority
-}
-}
+if schema_update_options:
+if write_disposition not in ["WRITE_APPEND", "WRITE_TRUNCATE"]:
+raise ValueError("schema_update_options is only "
+ "allowed if write_disposition is "
+ "'WRITE_APPEND' or 'WRITE_TRUNCATE'.")
 
 if destination_dataset_table:
-if '.' not in destination_dataset_table:
-raise ValueError(
-'Expected destination_dataset_table name in the format of '
-'.. Got: {}'.format(
-destination_dataset_table))
 destination_project, destination_dataset, destination_table = \
 _split_tablename(table_input=destination_dataset_table,
  default_project_id=self.project_id)
-configuration['query'].update({
-'allowLargeResults': allow_large_results,
-'flattenResults': flatten_results,
-'writeDisposition': write_disposition,
-'createDisposition': create_disposition,
-'destinationTable': {
-'projectId': destination_project,
-'datasetId': destination_dataset,
-'tableId': destination_table,
-}
-})
-if udf_config:
-if not isinstance(udf_config, list):
-raise TypeError("udf_config argument must have a type 'list'"
-" not {}".format(type(udf_config)))
-configuration['query'].update({
-'userDefinedFunctionResources': udf_config
-})
 
-if query_params:
-if self.use_legacy_sql:
-raise ValueError("Query parameters are not allowed when using "
- "legacy SQL")
-else:
-configuration['query']['queryParameters'] = query_params
+destination_dataset_table = {
+'projectId': destination_project,
+'datasetId': destination_dataset,
+'tableId': destination_table,
+}
 
-if labels:
-configuration['labels'] = labels
+query_param_list = [
+(sql, 

[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-29 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213894206
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -1114,18 +1184,18 @@ def get_schema(self, dataset_id, table_id):
 :param table_id: the table ID of the requested table
 :return: a table schema
 """
-tables_resource = self.service.tables() \
-.get(projectId=self.project_id, datasetId=dataset_id, 
tableId=table_id) \
 
 Review comment:
   just saw what, max-line-length = 90 in Airflow.
   removed changes with line 79
   
   I will be more clever next time


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-29 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213895073
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -220,26 +229,29 @@ def create_empty_table(self,
 :type table_id: str
 :param schema_fields: If set, the schema field list as defined here:
 
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load.schema
-:param labels: a dictionary containing labels for the table, passed to 
BigQuery
+:param labels: a dictionary containing labels for the table,
+passed to BigQuery
 :type labels: dict
 
 **Example**: ::
 
-schema_fields=[{"name": "emp_name", "type": "STRING", "mode": 
"REQUIRED"},
-   {"name": "salary", "type": "INTEGER", "mode": 
"NULLABLE"}]
+schema_fields=[
+{"name": "emp_name", "type": "STRING", "mode": "REQUIRED"},
+{"name": "salary", "type": "INTEGER", "mode": "NULLABLE"}
+]
 
 :type schema_fields: list
-:param time_partitioning: configure optional time partitioning fields 
i.e.
-partition by field, type and expiration as per API specifications.
+:param time_partitioning: configure optional time partitioning
+fields i.e. partition by field, type and expiration
+as per API specifications.
 
 .. seealso::
 
https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#timePartitioning
 :type time_partitioning: dict
 
 :return:
 """
-if time_partitioning is None:
 
 Review comment:
   logic in DRY, it was placed several times in code, and only one place where 
was a critical type of local var time_partitioning - in function 
_cleanse_time_partitioning, so I moved to check inside function, only to 
avoiding duplicate this check 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-29 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213858150
 
 

 ##
 File path: airflow/contrib/operators/bigquery_operator.py
 ##
 @@ -64,8 +64,9 @@ class BigQueryOperator(BaseOperator):
 :param udf_config: The User Defined Function configuration for the query.
 See https://cloud.google.com/bigquery/user-defined-functions for 
details.
 :type udf_config: list
-:param use_legacy_sql: Whether to use legacy SQL (true) or standard SQL 
(false).
-:type use_legacy_sql: boolean
+:param use_legacy_sql: Whether to use legacy SQL (true) or
+standard SQL (false).
+:type use_legacy_sql: booleanq
 
 Review comment:
   Yeah, I will change it. I broke my mind with error in test with bql 
deprecated warning. It ok in python 3.5, but error with python 2.7.. but it’s 
not errored before, not failed on master branch, but I did not touch warning or 
test... I try to fix it several days, must fun the fact, what in my local env 
this test always failed with python 2.7 (master too)...  panic


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-29 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213858150
 
 

 ##
 File path: airflow/contrib/operators/bigquery_operator.py
 ##
 @@ -64,8 +64,9 @@ class BigQueryOperator(BaseOperator):
 :param udf_config: The User Defined Function configuration for the query.
 See https://cloud.google.com/bigquery/user-defined-functions for 
details.
 :type udf_config: list
-:param use_legacy_sql: Whether to use legacy SQL (true) or standard SQL 
(false).
-:type use_legacy_sql: boolean
+:param use_legacy_sql: Whether to use legacy SQL (true) or
+standard SQL (false).
+:type use_legacy_sql: booleanq
 
 Review comment:
   Yeah, I will change it. I broke my mind with error in test with bql 
deprecated warning. It ok in python 3.5, but error with python 2.7.. but it’s 
not errored before, not failed on master branch, but I did not touch warning or 
test... I try to fix it several days, must fun the fact, what in my local env 
this test always failed with python 2.7 ...  panic


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-29 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213858150
 
 

 ##
 File path: airflow/contrib/operators/bigquery_operator.py
 ##
 @@ -64,8 +64,9 @@ class BigQueryOperator(BaseOperator):
 :param udf_config: The User Defined Function configuration for the query.
 See https://cloud.google.com/bigquery/user-defined-functions for 
details.
 :type udf_config: list
-:param use_legacy_sql: Whether to use legacy SQL (true) or standard SQL 
(false).
-:type use_legacy_sql: boolean
+:param use_legacy_sql: Whether to use legacy SQL (true) or
+standard SQL (false).
+:type use_legacy_sql: booleanq
 
 Review comment:
   Yeah, I will change it. I broke my mind with error in test with bql 
deprecated warning. It ok in python 3.6, but error with python 2.7.. but it’s 
not errored before, not failed on master branch, but I did not touch warning or 
test... I try to fix it several days, must fun the fact, what in my local env 
this test always failed with python 2.7 ...  panic


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-24 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r212533643
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -587,15 +603,22 @@ def run_query(self,
 if use_legacy_sql is None:
 use_legacy_sql = self.use_legacy_sql
 
-configuration = {
-'query': {
+configuration = deepcopy(api_resource_configs)
+
+query_default =  {
 'query': sql,
 'useLegacySql': use_legacy_sql,
 'maximumBillingTier': maximum_billing_tier,
 'maximumBytesBilled': maximum_bytes_billed,
 'priority': priority
 }
-}
+
+if not 'query' in configuration:
+configuration['query'] = query_default
+else:
+for param in query_default:
+if param not in configuration['query']:
 
 Review comment:
   I take apart it on two types - values what default by None and we raise the 
error as with sql. So priority will be on arg keyword, and only if it not exist 
- check api_resource and get key from it. Or you think need to raise error 
anyway if keys are duplicated? Or maybe just warning.
   Also, we can decide the priority of ways to passing args and if key provided 
with arg to method or operator - ignore it in api_ var and just write warning. 
   2-nd thing is keys what have default values like useLegacySql or priority. 
Because they are is not None by default we can not check what they were set by 
a user or not. But, I really don't see necessary for those default settings, 
because they default in google API and as I understand we do not need to 
provide them. 
   
   More I look on this config in the run_query, more want to change globally 
logic of creating conf for API request, or refactor part with configuration 
generating or forget idea with api_ var and just add useQueryCache param )) too 
many ifs...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-24 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r212533643
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -587,15 +603,22 @@ def run_query(self,
 if use_legacy_sql is None:
 use_legacy_sql = self.use_legacy_sql
 
-configuration = {
-'query': {
+configuration = deepcopy(api_resource_configs)
+
+query_default =  {
 'query': sql,
 'useLegacySql': use_legacy_sql,
 'maximumBillingTier': maximum_billing_tier,
 'maximumBytesBilled': maximum_bytes_billed,
 'priority': priority
 }
-}
+
+if not 'query' in configuration:
+configuration['query'] = query_default
+else:
+for param in query_default:
+if param not in configuration['query']:
 
 Review comment:
   I take apart it on two types - values what default by None and we raise the 
error as with sql. So priority will be on arg keyword, and only if it not exist 
- check api_resource and get key from it. Or you think need to raise error 
anyway if keys are duplicated? 
   
   Or maybe just warning.
   
   2-nd thing is keys what have default values like useLegacySql or priority. 
Because they are is not None by default we can not check what they were set by 
a user or not. But, I really don't see necessary for those default settings, 
because they default in google API and as I understand we do not need to 
provide them. But more I look on this config in the run_query, more want to 
change globally logic, or refactor part with configuration generating or forget 
idea with api_ var and just add useQueryCache param )) too many ifs...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-23 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r212469186
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -656,6 +668,21 @@ def run_query(self,
 configuration['query'][
 'schemaUpdateOptions'] = schema_update_options
 
+if not api_resource_configs:
+api_resource_configs = self.api_resource_configs
+
+if 'configuration' in api_resource_configs and isinstance(
 
 Review comment:
   check pls


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-23 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r212466137
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -656,6 +668,21 @@ def run_query(self,
 configuration['query'][
 'schemaUpdateOptions'] = schema_update_options
 
+if not api_resource_configs:
+api_resource_configs = self.api_resource_configs
+
+if 'configuration' in api_resource_configs and isinstance(
 
 Review comment:
   Do you mean to do a copy of api_config in configuration at first and then 
add other params to configuration if they do not exist? Am I right understand? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-23 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r212466137
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -656,6 +668,21 @@ def run_query(self,
 configuration['query'][
 'schemaUpdateOptions'] = schema_update_options
 
+if not api_resource_configs:
+api_resource_configs = self.api_resource_configs
+
+if 'configuration' in api_resource_configs and isinstance(
 
 Review comment:
   Do you mean first do a copy of api_config in configuration and then add 
other params to configuration if they do not exist? Am I right understand? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-23 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r212444862
 
 

 ##
 File path: tests/contrib/hooks/test_bigquery_hook.py
 ##
 @@ -281,6 +281,18 @@ def test_run_query_sql_dialect_override(self, 
run_with_config):
 args, kwargs = run_with_config.call_args
 self.assertIs(args[0]['query']['useLegacySql'], bool_val)
 
+@mock.patch.object(hook.BigQueryBaseCursor, 'run_with_configuration')
+def test_api_resource_configs(self, run_with_config):
+for bool_val in [True, False]:
+cursor = hook.BigQueryBaseCursor(mock.Mock(), "project_id")
+cursor.run_query('query',
+ api_resource_configs={
+ 'configuration':
+ {'query': {'useQueryCache': bool_val}}})
+
+args, kwargs = run_with_config.call_args
+self.assertIs(args[0]['query']['useQueryCache'], bool_val)
 
 Review comment:
   and added check that default values exist and were not delete


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-23 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r212443568
 
 

 ##
 File path: tests/contrib/hooks/test_bigquery_hook.py
 ##
 @@ -281,6 +281,18 @@ def test_run_query_sql_dialect_override(self, 
run_with_config):
 args, kwargs = run_with_config.call_args
 self.assertIs(args[0]['query']['useLegacySql'], bool_val)
 
+@mock.patch.object(hook.BigQueryBaseCursor, 'run_with_configuration')
+def test_api_resource_configs(self, run_with_config):
+for bool_val in [True, False]:
+cursor = hook.BigQueryBaseCursor(mock.Mock(), "project_id")
+cursor.run_query('query',
+ api_resource_configs={
+ 'configuration':
+ {'query': {'useQueryCache': bool_val}}})
+
+args, kwargs = run_with_config.call_args
+self.assertIs(args[0]['query']['useQueryCache'], bool_val)
 
 Review comment:
   resolved with changing the logic in upper comment


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-21 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r211637030
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -656,6 +671,10 @@ def run_query(self,
 configuration['query'][
 'schemaUpdateOptions'] = schema_update_options
 
+if 'configuration' in api_resource_configs:
+for key in api_resource_configs['configuration']:
+configuration[key] = api_resource_configs['configuration'][key]
 
 Review comment:
   @tswast , check it pls


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-21 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r211634204
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -656,6 +671,10 @@ def run_query(self,
 configuration['query'][
 'schemaUpdateOptions'] = schema_update_options
 
+if 'configuration' in api_resource_configs:
 
 Review comment:
   I thought about this param like a possibility to set any settings in one 
dict, that supported by API (because of 'src_fmt_configs'). But if we say what 
it's only the way to set 'configuration' and we will not available this way to 
send any other params, for example, 'jobReference' or 'statistics', so it's not 
needed. What do you think? Use it only for 'configuration' ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-21 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r211634204
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -656,6 +671,10 @@ def run_query(self,
 configuration['query'][
 'schemaUpdateOptions'] = schema_update_options
 
+if 'configuration' in api_resource_configs:
 
 Review comment:
   I thought about this param like a possibility to set any settings in one 
dict, that supported by API (because of 'src_fmt_configs'). For example, 
'jobReference' or 'statistics': 
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs. But if we say 
what it's only the way to set 'configuration' so it's not needed. What do you 
think? Use it only for 'configuration' ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-21 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r211634204
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -656,6 +671,10 @@ def run_query(self,
 configuration['query'][
 'schemaUpdateOptions'] = schema_update_options
 
+if 'configuration' in api_resource_configs:
 
 Review comment:
   I thought about this param like a possibility to set any settings in one 
dict, that supported by API (because of 'src_fmt_configs'). But if we say what 
it's only the way to set 'configuration' and we will not available this way to 
send any other params, for example, 'jobReference' or 'statistics' .. so way 
what I could define one dict with all params and send it. So if it's not needed 
and it will be only for 'configuration' - it's of course not needed. What do 
you think? Use it only for 'configuration' ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-21 Thread GitBox
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r211625138
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -656,6 +671,10 @@ def run_query(self,
 configuration['query'][
 'schemaUpdateOptions'] = schema_update_options
 
+if 'configuration' in api_resource_configs:
+for key in api_resource_configs['configuration']:
+configuration[key] = api_resource_configs['configuration'][key]
 
 Review comment:
   mm.. no, it will re-write only keys what will be added by the user in 
api_resource_configs, or I didn't catch what you mean.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services