Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]
potiuk merged PR #38408: URL: https://github.com/apache/airflow/pull/38408 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]
kacpermuda commented on code in PR #38408: URL: https://github.com/apache/airflow/pull/38408#discussion_r1571133881 ## airflow/providers/google/cloud/operators/bigquery.py: ## @@ -322,8 +322,26 @@ def execute(self, context: Context): ), method_name="execute_complete", ) +self._handle_job_error(job) +# job.result() returns a RowIterator. Mypy expects an instance of SupportsNext[Any] for +# the next() call which the RowIterator does not resemble to. Hence, ignore the arg-type error. +records = next(job.result()) # type: ignore[arg-type] Review Comment: Okay, i had a moment to look into it today. I don't think mypy is right here. Everything was almost fine. I was wrong when describing the Trigger - there is no problem there, it returns a single row with values only (co column names), so the code check what it's supposed to check. When in deferrable mode but not deferred, it's okay too that we are passing a single row to the check_value() / _validate_records() function, and this row only contains actual values without column names, f.e. `[Row((1, 'test'), {'f0_': 0, 'f1_': 1})]`. This means, that the `all()` check works just fine. The code was just not prepared for an empty result, so i had to adjust it and add some tests. I really don't know how to satisfy mypy here. I wasted some time there already but it keeps throwing errors so i just left the ignore as it is. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]
potiuk commented on code in PR #38408: URL: https://github.com/apache/airflow/pull/38408#discussion_r1552213981 ## airflow/providers/google/cloud/operators/bigquery.py: ## @@ -322,8 +322,26 @@ def execute(self, context: Context): ), method_name="execute_complete", ) +self._handle_job_error(job) +# job.result() returns a RowIterator. Mypy expects an instance of SupportsNext[Any] for +# the next() call which the RowIterator does not resemble to. Hence, ignore the arg-type error. +records = next(job.result()) # type: ignore[arg-type] Review Comment: Well. For me it looks like we should fix it in both places - both seem wrong,. Seems like in both cases we get over the error accidentally - because single record is a tuple, it's also iterable, so the all() check in "validate_records" and the checks done in "check_value" are iterating over fields in the record not over the records themselves - and it checks for something different than intended. It seems that (at least that would be logical thing to do here), the right approach should be to do: ```python records = [ next(job.result()) ] ``` Which would turn the single record into an iterable array (and similar for `check_value`) - but that should of course be verified in some real case, if what we get is what we wanted to get. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]
potiuk commented on code in PR #38408: URL: https://github.com/apache/airflow/pull/38408#discussion_r1552213981 ## airflow/providers/google/cloud/operators/bigquery.py: ## @@ -322,8 +322,26 @@ def execute(self, context: Context): ), method_name="execute_complete", ) +self._handle_job_error(job) +# job.result() returns a RowIterator. Mypy expects an instance of SupportsNext[Any] for +# the next() call which the RowIterator does not resemble to. Hence, ignore the arg-type error. +records = next(job.result()) # type: ignore[arg-type] Review Comment: Well. For me it looks like we should fix it in both places - both seem wrong,. Seems like in both cases we get over the error accidentally - because single record is a tuple, it's also iterable, so the all() check in "validate_records" and the checks done in "check_value" are iterating over fields in the record not over the records themselves - and it checks for something different than intended. It seems that (at least that would be logical thing to do here), the right apprach should be to do: ```python records = [ next(job.result()) ] ``` Which would turn the single record into an iterable array (and similar for `check_value`) - but that should of course be verified in some real case, if what we get is what we wanted to get. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]
potiuk commented on code in PR #38408: URL: https://github.com/apache/airflow/pull/38408#discussion_r1552215444 ## airflow/providers/google/cloud/operators/bigquery.py: ## @@ -322,8 +322,26 @@ def execute(self, context: Context): ), method_name="execute_complete", ) +self._handle_job_error(job) +# job.result() returns a RowIterator. Mypy expects an instance of SupportsNext[Any] for +# the next() call which the RowIterator does not resemble to. Hence, ignore the arg-type error. +records = next(job.result()) # type: ignore[arg-type] Review Comment: That also might be the reason why we have `# type: ignore[arg-type]` - because we are passing wrong type (so mypy ALSO suggest us that we have a problem here). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]
potiuk commented on code in PR #38408: URL: https://github.com/apache/airflow/pull/38408#discussion_r1552215444 ## airflow/providers/google/cloud/operators/bigquery.py: ## @@ -322,8 +322,26 @@ def execute(self, context: Context): ), method_name="execute_complete", ) +self._handle_job_error(job) +# job.result() returns a RowIterator. Mypy expects an instance of SupportsNext[Any] for +# the next() call which the RowIterator does not resemble to. Hence, ignore the arg-type error. +records = next(job.result()) # type: ignore[arg-type] Review Comment: That also might be the reason why we have `# type: ignore[arg-type]` - because we are passing wrong type -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]
potiuk commented on code in PR #38408: URL: https://github.com/apache/airflow/pull/38408#discussion_r1552213981 ## airflow/providers/google/cloud/operators/bigquery.py: ## @@ -322,8 +322,26 @@ def execute(self, context: Context): ), method_name="execute_complete", ) +self._handle_job_error(job) +# job.result() returns a RowIterator. Mypy expects an instance of SupportsNext[Any] for +# the next() call which the RowIterator does not resemble to. Hence, ignore the arg-type error. +records = next(job.result()) # type: ignore[arg-type] Review Comment: Well. For me it looks like we should fix it in both places - both seem wrong,. Seems like in both cases we get over the error accidentally - because single record is a tuple, it's also iterable, so the all() check in "validate_records" and the checks done in "check_value" are iterating over fields in the record not over the records themselves - and it checks for something different than intended. It seems that (at least that would be logical thing to do here), the right apprach should be to do: ```python records = [ next(job.result() ] ``` Which would turn the single record into an iterable array (and similar for `check_value`) - but that should of course be verified in some real case, if what we get is what we wanted to get. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]
potiuk commented on code in PR #38408: URL: https://github.com/apache/airflow/pull/38408#discussion_r1552213981 ## airflow/providers/google/cloud/operators/bigquery.py: ## @@ -322,8 +322,26 @@ def execute(self, context: Context): ), method_name="execute_complete", ) +self._handle_job_error(job) +# job.result() returns a RowIterator. Mypy expects an instance of SupportsNext[Any] for +# the next() call which the RowIterator does not resemble to. Hence, ignore the arg-type error. +records = next(job.result()) # type: ignore[arg-type] Review Comment: Well. For me it looks like we should fix it in both places - both seem wrong,. Seem like in both cases we seem to get over the error accidentally - because single record is a tuple, it's also iterable, so the all() check in "validate_records" and the checks done in "check_value" are iterating over fields in the record not over the records themselves - and it checks for something different than intended. It seems that (at least that would be logical thing to do here), the right apprach should be to do: ```python records = [ next(job.result() ] ``` Which would turn the single record into an iterable array (and similar for `check_value`) - but that should of course be verified in some real case, if what we get is what we wanted to get. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]
kacpermuda commented on code in PR #38408: URL: https://github.com/apache/airflow/pull/38408#discussion_r1537239202 ## airflow/providers/google/cloud/operators/bigquery.py: ## @@ -322,8 +322,26 @@ def execute(self, context: Context): ), method_name="execute_complete", ) +self._handle_job_error(job) +# job.result() returns a RowIterator. Mypy expects an instance of SupportsNext[Any] for +# the next() call which the RowIterator does not resemble to. Hence, ignore the arg-type error. +records = next(job.result()) # type: ignore[arg-type] Review Comment: Yes, `next()` returns only a single row, and we are only checking the first row. `validate_records` expect a single row (even though the name of the argument is `records`, plural). This is because I was recreating what's happening in the [Trigger](https://github.com/apache/airflow/blob/420709962ec864fa62b6e1b6c2f723a2c14cb751/airflow/providers/google/cloud/triggers/bigquery.py#L165) itself that checks only the first row (but single row is returned as `event["records"]`). I was wondering why it happens as the `execute_complete` uses the `all(records)` that does not check much in that case, but that's probably for another issue. For now, i think when not deferred but in deferrable mode, we should recreate what's happening in the trigger to maintain the same behaviour. Also, this `next()` part is copied from BigQueryValueCheckOperator's [check](https://github.com/apache/airflow/blob/420709962ec864fa62b6e1b6c2f723a2c14cb751/airflow/providers/google/cloud/operators/bigquery.py#L454), so if it's wrong, it should be changed there too. Maybe that's not the correct attitude, let me know if there is a better way to do it 😄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]
kacpermuda commented on code in PR #38408: URL: https://github.com/apache/airflow/pull/38408#discussion_r1537239202 ## airflow/providers/google/cloud/operators/bigquery.py: ## @@ -322,8 +322,26 @@ def execute(self, context: Context): ), method_name="execute_complete", ) +self._handle_job_error(job) +# job.result() returns a RowIterator. Mypy expects an instance of SupportsNext[Any] for +# the next() call which the RowIterator does not resemble to. Hence, ignore the arg-type error. +records = next(job.result()) # type: ignore[arg-type] Review Comment: Yes, `next()` returns only a single row, and we are only checking the first row. `validate_records` expect a single row (even though the name of the argument is `records`, plural). This is because I was recreating what's happening in the [Trigger](https://github.com/apache/airflow/blob/420709962ec864fa62b6e1b6c2f723a2c14cb751/airflow/providers/google/cloud/triggers/bigquery.py#L165) itself that checks only the first row (but single row is returned as `event["records"]`). I was wondering why it happens as in the execute_complete uses the `all(records):` that does not check much in that case, but that's probably for another issue. For now, i think when not deferred but in deferrable mode, we should recreate what's happening in the trigger to maintain the same behaviour. Also, this `next()` part is copied from BigQueryValueCheckOperator's [check](https://github.com/apache/airflow/blob/420709962ec864fa62b6e1b6c2f723a2c14cb751/airflow/providers/google/cloud/operators/bigquery.py#L454), so if it's wrong, it should be changed there too. Maybe that's not the correct attitude, let me know if there is a better way to do it 😄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]
kacpermuda commented on code in PR #38408: URL: https://github.com/apache/airflow/pull/38408#discussion_r1537232852 ## airflow/providers/google/cloud/operators/bigquery.py: ## @@ -322,8 +322,26 @@ def execute(self, context: Context): ), method_name="execute_complete", ) +self._handle_job_error(job) +# job.result() returns a RowIterator. Mypy expects an instance of SupportsNext[Any] for +# the next() call which the RowIterator does not resemble to. Hence, ignore the arg-type error. +records = next(job.result()) # type: ignore[arg-type] Review Comment: Yes, it does return only a single row, and we are only checking the first row. I was recreating what's happening in the [Trigger](https://github.com/apache/airflow/blob/420709962ec864fa62b6e1b6c2f723a2c14cb751/airflow/providers/google/cloud/triggers/bigquery.py#L165) itself that checks only the first row (first row is returned in `event["records"]`). I was wondering why it happens as in the execute_complete uses the `all(records):` that does not check much in that case, but that's probably for another issue. For now, i think when not deferred but in deferrable mode, we should recreate what's happening in the trigger to maintain the same behaviour. Also, this `next()` part is copied from BigQueryValueCheckOperator's [check](https://github.com/apache/airflow/blob/420709962ec864fa62b6e1b6c2f723a2c14cb751/airflow/providers/google/cloud/operators/bigquery.py#L454), so if it's wrong, it should be changed there too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]
kacpermuda commented on code in PR #38408: URL: https://github.com/apache/airflow/pull/38408#discussion_r1537232852 ## airflow/providers/google/cloud/operators/bigquery.py: ## @@ -322,8 +322,26 @@ def execute(self, context: Context): ), method_name="execute_complete", ) +self._handle_job_error(job) +# job.result() returns a RowIterator. Mypy expects an instance of SupportsNext[Any] for +# the next() call which the RowIterator does not resemble to. Hence, ignore the arg-type error. +records = next(job.result()) # type: ignore[arg-type] Review Comment: Yes, it does return only a single row, and we are only checking the first row. I was recreating what's happening in the [Trigger](https://github.com/apache/airflow/blob/420709962ec864fa62b6e1b6c2f723a2c14cb751/airflow/providers/google/cloud/triggers/bigquery.py#L165) itself that checks only the first row (first row is returned in `event["records"]`). I was wondering why it happens as in the execute_complete uses the `all(records):` that does not check much in that case, but that's probably for another issue. For now, i think when not deferred but in deferrable mode, we should recreate what's happening in the trigger to maintain the same behaviour. Also, this `next()` part is copied from BigQueryValueCheckOperator's [check](https://github.com/apache/airflow/blob/420709962ec864fa62b6e1b6c2f723a2c14cb751/airflow/providers/google/cloud/operators/bigquery.py#L454), so if it's wrong, it should be changed there too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]
dirrao commented on code in PR #38408: URL: https://github.com/apache/airflow/pull/38408#discussion_r1536548308 ## airflow/providers/google/cloud/operators/bigquery.py: ## @@ -322,8 +322,26 @@ def execute(self, context: Context): ), method_name="execute_complete", ) +self._handle_job_error(job) +# job.result() returns a RowIterator. Mypy expects an instance of SupportsNext[Any] for +# the next() call which the RowIterator does not resemble to. Hence, ignore the arg-type error. +records = next(job.result()) # type: ignore[arg-type] Review Comment: next call call might returns a row object not row iterator. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]
potiuk commented on code in PR #38408: URL: https://github.com/apache/airflow/pull/38408#discussion_r1535855822 ## airflow/providers/google/cloud/operators/bigquery.py: ## @@ -322,8 +322,26 @@ def execute(self, context: Context): ), method_name="execute_complete", ) +self._handle_job_error(job) +# job.result() returns a RowIterator. Mypy expects an instance of SupportsNext[Any] for +# the next() call which the RowIterator does not resemble to. Hence, ignore the arg-type error. +records = next(job.result()) # type: ignore[arg-type] Review Comment: Does'nt `next(job.result())` return single Row (if job.result() returnes RowIterator)? And validate_records expect Iterator ? (or maybe I am missing somethong) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org