Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]

2024-04-19 Thread via GitHub


potiuk merged PR #38408:
URL: https://github.com/apache/airflow/pull/38408


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]

2024-04-18 Thread via GitHub


kacpermuda commented on code in PR #38408:
URL: https://github.com/apache/airflow/pull/38408#discussion_r1571133881


##
airflow/providers/google/cloud/operators/bigquery.py:
##
@@ -322,8 +322,26 @@ def execute(self, context: Context):
 ),
 method_name="execute_complete",
 )
+self._handle_job_error(job)
+# job.result() returns a RowIterator. Mypy expects an instance of 
SupportsNext[Any] for
+# the next() call which the RowIterator does not resemble to. 
Hence, ignore the arg-type error.
+records = next(job.result())  # type: ignore[arg-type]

Review Comment:
   Okay, i had a moment to look into it today. I don't think mypy is right 
here. Everything was almost fine. I was wrong when describing the Trigger - 
there is no problem there, it returns a single row with values only (co column 
names), so the code check what it's supposed to check.
   
   When in deferrable mode but not deferred, it's okay too that we are passing 
a single row to the check_value() / _validate_records() function, and this row 
only contains actual values without column names, f.e. `[Row((1, 'test'), 
{'f0_': 0, 'f1_': 1})]`. This means, that the `all()` check works just fine.
   
   The code was just not prepared for an empty result, so i had to adjust it 
and add some tests. I really don't know how to satisfy mypy here. I wasted some 
time there already but it keeps throwing errors so i just left the ignore as it 
is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]

2024-04-04 Thread via GitHub


potiuk commented on code in PR #38408:
URL: https://github.com/apache/airflow/pull/38408#discussion_r1552213981


##
airflow/providers/google/cloud/operators/bigquery.py:
##
@@ -322,8 +322,26 @@ def execute(self, context: Context):
 ),
 method_name="execute_complete",
 )
+self._handle_job_error(job)
+# job.result() returns a RowIterator. Mypy expects an instance of 
SupportsNext[Any] for
+# the next() call which the RowIterator does not resemble to. 
Hence, ignore the arg-type error.
+records = next(job.result())  # type: ignore[arg-type]

Review Comment:
   Well. For me it looks like we should fix it in both places -  both seem 
wrong,. Seems like in both cases we get over the error accidentally - because 
single record is a tuple, it's also iterable, so the all() check in 
"validate_records" and the checks done in "check_value" are iterating over 
fields in the record not over the records themselves - and it checks for 
something different than intended.
   
   It seems that (at least that would be logical thing to do here), the right 
approach should be to do:
   
   ```python
   records = [ next(job.result()) ]
   ```
   
   Which would turn the single record into an iterable array (and similar for 
`check_value`) - but that should of course be verified in some real case, if 
what we get is what we wanted to get.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]

2024-04-04 Thread via GitHub


potiuk commented on code in PR #38408:
URL: https://github.com/apache/airflow/pull/38408#discussion_r1552213981


##
airflow/providers/google/cloud/operators/bigquery.py:
##
@@ -322,8 +322,26 @@ def execute(self, context: Context):
 ),
 method_name="execute_complete",
 )
+self._handle_job_error(job)
+# job.result() returns a RowIterator. Mypy expects an instance of 
SupportsNext[Any] for
+# the next() call which the RowIterator does not resemble to. 
Hence, ignore the arg-type error.
+records = next(job.result())  # type: ignore[arg-type]

Review Comment:
   Well. For me it looks like we should fix it in both places -  both seem 
wrong,. Seems like in both cases we get over the error accidentally - because 
single record is a tuple, it's also iterable, so the all() check in 
"validate_records" and the checks done in "check_value" are iterating over 
fields in the record not over the records themselves - and it checks for 
something different than intended.
   
   It seems that (at least that would be logical thing to do here), the right 
apprach should be to do:
   
   ```python
   records = [ next(job.result()) ]
   ```
   
   Which would turn the single record into an iterable array (and similar for 
`check_value`) - but that should of course be verified in some real case, if 
what we get is what we wanted to get.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]

2024-04-04 Thread via GitHub


potiuk commented on code in PR #38408:
URL: https://github.com/apache/airflow/pull/38408#discussion_r1552215444


##
airflow/providers/google/cloud/operators/bigquery.py:
##
@@ -322,8 +322,26 @@ def execute(self, context: Context):
 ),
 method_name="execute_complete",
 )
+self._handle_job_error(job)
+# job.result() returns a RowIterator. Mypy expects an instance of 
SupportsNext[Any] for
+# the next() call which the RowIterator does not resemble to. 
Hence, ignore the arg-type error.
+records = next(job.result())  # type: ignore[arg-type]

Review Comment:
   That also might be the reason why we have `# type: ignore[arg-type]` - 
because we are passing wrong type (so mypy ALSO suggest us that we have a 
problem here).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]

2024-04-04 Thread via GitHub


potiuk commented on code in PR #38408:
URL: https://github.com/apache/airflow/pull/38408#discussion_r1552215444


##
airflow/providers/google/cloud/operators/bigquery.py:
##
@@ -322,8 +322,26 @@ def execute(self, context: Context):
 ),
 method_name="execute_complete",
 )
+self._handle_job_error(job)
+# job.result() returns a RowIterator. Mypy expects an instance of 
SupportsNext[Any] for
+# the next() call which the RowIterator does not resemble to. 
Hence, ignore the arg-type error.
+records = next(job.result())  # type: ignore[arg-type]

Review Comment:
   That also might be the reason why we have `# type: ignore[arg-type]` - 
because we are passing wrong type



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]

2024-04-04 Thread via GitHub


potiuk commented on code in PR #38408:
URL: https://github.com/apache/airflow/pull/38408#discussion_r1552213981


##
airflow/providers/google/cloud/operators/bigquery.py:
##
@@ -322,8 +322,26 @@ def execute(self, context: Context):
 ),
 method_name="execute_complete",
 )
+self._handle_job_error(job)
+# job.result() returns a RowIterator. Mypy expects an instance of 
SupportsNext[Any] for
+# the next() call which the RowIterator does not resemble to. 
Hence, ignore the arg-type error.
+records = next(job.result())  # type: ignore[arg-type]

Review Comment:
   Well. For me it looks like we should fix it in both places -  both seem 
wrong,. Seems like in both cases we get over the error accidentally - because 
single record is a tuple, it's also iterable, so the all() check in 
"validate_records" and the checks done in "check_value" are iterating over 
fields in the record not over the records themselves - and it checks for 
something different than intended.
   
   It seems that (at least that would be logical thing to do here), the right 
apprach should be to do:
   
   ```python
   records = [ next(job.result() ]
   ```
   
   Which would turn the single record into an iterable array (and similar for 
`check_value`) - but that should of course be verified in some real case, if 
what we get is what we wanted to get.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]

2024-04-04 Thread via GitHub


potiuk commented on code in PR #38408:
URL: https://github.com/apache/airflow/pull/38408#discussion_r1552213981


##
airflow/providers/google/cloud/operators/bigquery.py:
##
@@ -322,8 +322,26 @@ def execute(self, context: Context):
 ),
 method_name="execute_complete",
 )
+self._handle_job_error(job)
+# job.result() returns a RowIterator. Mypy expects an instance of 
SupportsNext[Any] for
+# the next() call which the RowIterator does not resemble to. 
Hence, ignore the arg-type error.
+records = next(job.result())  # type: ignore[arg-type]

Review Comment:
   Well. For me it looks like we should fix it in both places -  both seem 
wrong,. Seem like in both cases we seem to get over the error accidentally - 
because single record is a tuple, it's also iterable, so the all() check in 
"validate_records" and the checks done in "check_value" are iterating over 
fields in the record not over the records themselves - and it checks for 
something different than intended.
   
   It seems that (at least that would be logical thing to do here), the right 
apprach should be to do:
   
   ```python
   records = [ next(job.result() ]
   ```
   
   Which would turn the single record into an iterable array (and similar for 
`check_value`) - but that should of course be verified in some real case, if 
what we get is what we wanted to get.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]

2024-03-25 Thread via GitHub


kacpermuda commented on code in PR #38408:
URL: https://github.com/apache/airflow/pull/38408#discussion_r1537239202


##
airflow/providers/google/cloud/operators/bigquery.py:
##
@@ -322,8 +322,26 @@ def execute(self, context: Context):
 ),
 method_name="execute_complete",
 )
+self._handle_job_error(job)
+# job.result() returns a RowIterator. Mypy expects an instance of 
SupportsNext[Any] for
+# the next() call which the RowIterator does not resemble to. 
Hence, ignore the arg-type error.
+records = next(job.result())  # type: ignore[arg-type]

Review Comment:
   Yes, `next()` returns only a single row, and we are only checking the first 
row. `validate_records` expect a single row (even though the name of the 
argument is `records`, plural). This is because I was recreating what's 
happening in the 
[Trigger](https://github.com/apache/airflow/blob/420709962ec864fa62b6e1b6c2f723a2c14cb751/airflow/providers/google/cloud/triggers/bigquery.py#L165)
 itself that checks only the first row (but single row is returned as 
`event["records"]`). I was wondering why it happens as the `execute_complete` 
uses the `all(records)` that does not check much in that case, but that's 
probably for another issue. For now, i think when not deferred but in 
deferrable mode, we should recreate what's happening in the trigger to maintain 
the same behaviour.
   
   Also, this `next()` part is copied from BigQueryValueCheckOperator's 
[check](https://github.com/apache/airflow/blob/420709962ec864fa62b6e1b6c2f723a2c14cb751/airflow/providers/google/cloud/operators/bigquery.py#L454),
 so if it's wrong, it should be changed there too.
   
   Maybe that's not the correct attitude, let me know if there is a better way 
to do it 😄 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]

2024-03-25 Thread via GitHub


kacpermuda commented on code in PR #38408:
URL: https://github.com/apache/airflow/pull/38408#discussion_r1537239202


##
airflow/providers/google/cloud/operators/bigquery.py:
##
@@ -322,8 +322,26 @@ def execute(self, context: Context):
 ),
 method_name="execute_complete",
 )
+self._handle_job_error(job)
+# job.result() returns a RowIterator. Mypy expects an instance of 
SupportsNext[Any] for
+# the next() call which the RowIterator does not resemble to. 
Hence, ignore the arg-type error.
+records = next(job.result())  # type: ignore[arg-type]

Review Comment:
   Yes, `next()` returns only a single row, and we are only checking the first 
row. `validate_records` expect a single row (even though the name of the 
argument is `records`, plural). This is because I was recreating what's 
happening in the 
[Trigger](https://github.com/apache/airflow/blob/420709962ec864fa62b6e1b6c2f723a2c14cb751/airflow/providers/google/cloud/triggers/bigquery.py#L165)
 itself that checks only the first row (but single row is returned as 
`event["records"]`). I was wondering why it happens as in the execute_complete 
uses the `all(records):` that does not check much in that case, but that's 
probably for another issue. For now, i think when not deferred but in 
deferrable mode, we should recreate what's happening in the trigger to maintain 
the same behaviour.
   
   Also, this `next()` part is copied from BigQueryValueCheckOperator's 
[check](https://github.com/apache/airflow/blob/420709962ec864fa62b6e1b6c2f723a2c14cb751/airflow/providers/google/cloud/operators/bigquery.py#L454),
 so if it's wrong, it should be changed there too.
   
   Maybe that's not the correct attitude, let me know if there is a better way 
to do it 😄 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]

2024-03-25 Thread via GitHub


kacpermuda commented on code in PR #38408:
URL: https://github.com/apache/airflow/pull/38408#discussion_r1537232852


##
airflow/providers/google/cloud/operators/bigquery.py:
##
@@ -322,8 +322,26 @@ def execute(self, context: Context):
 ),
 method_name="execute_complete",
 )
+self._handle_job_error(job)
+# job.result() returns a RowIterator. Mypy expects an instance of 
SupportsNext[Any] for
+# the next() call which the RowIterator does not resemble to. 
Hence, ignore the arg-type error.
+records = next(job.result())  # type: ignore[arg-type]

Review Comment:
   Yes, it does return only a single row, and we are only checking the first 
row. I was recreating what's happening in the 
[Trigger](https://github.com/apache/airflow/blob/420709962ec864fa62b6e1b6c2f723a2c14cb751/airflow/providers/google/cloud/triggers/bigquery.py#L165)
 itself that checks only the first row (first row is returned in 
`event["records"]`). I was wondering why it happens as in the execute_complete 
uses the `all(records):` that does not check much in that case, but that's 
probably for another issue. For now, i think when not deferred but in 
deferrable mode, we should recreate what's happening in the trigger to maintain 
the same behaviour.
   
   Also, this `next()` part is copied from BigQueryValueCheckOperator's 
[check](https://github.com/apache/airflow/blob/420709962ec864fa62b6e1b6c2f723a2c14cb751/airflow/providers/google/cloud/operators/bigquery.py#L454),
 so if it's wrong, it should be changed there too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]

2024-03-25 Thread via GitHub


kacpermuda commented on code in PR #38408:
URL: https://github.com/apache/airflow/pull/38408#discussion_r1537232852


##
airflow/providers/google/cloud/operators/bigquery.py:
##
@@ -322,8 +322,26 @@ def execute(self, context: Context):
 ),
 method_name="execute_complete",
 )
+self._handle_job_error(job)
+# job.result() returns a RowIterator. Mypy expects an instance of 
SupportsNext[Any] for
+# the next() call which the RowIterator does not resemble to. 
Hence, ignore the arg-type error.
+records = next(job.result())  # type: ignore[arg-type]

Review Comment:
   Yes, it does return only a single row, and we are only checking the first 
row. I was recreating what's happening in the 
[Trigger](https://github.com/apache/airflow/blob/420709962ec864fa62b6e1b6c2f723a2c14cb751/airflow/providers/google/cloud/triggers/bigquery.py#L165)
 itself that checks only the first row (first row is returned in 
`event["records"]`). I was wondering why it happens as in the execute_complete 
uses the `all(records):` that does not check much in that case, but that's 
probably for another issue. For now, i think when not deferred but in 
deferrable mode, we should recreate what's happening in the trigger to maintain 
the same behaviour.
   
   Also, this `next()` part is copied from BigQueryValueCheckOperator's 
[check](https://github.com/apache/airflow/blob/420709962ec864fa62b6e1b6c2f723a2c14cb751/airflow/providers/google/cloud/operators/bigquery.py#L454),
 so if it's wrong, it should be changed there too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]

2024-03-22 Thread via GitHub


dirrao commented on code in PR #38408:
URL: https://github.com/apache/airflow/pull/38408#discussion_r1536548308


##
airflow/providers/google/cloud/operators/bigquery.py:
##
@@ -322,8 +322,26 @@ def execute(self, context: Context):
 ),
 method_name="execute_complete",
 )
+self._handle_job_error(job)
+# job.result() returns a RowIterator. Mypy expects an instance of 
SupportsNext[Any] for
+# the next() call which the RowIterator does not resemble to. 
Hence, ignore the arg-type error.
+records = next(job.result())  # type: ignore[arg-type]

Review Comment:
   next call call might returns a row object not row iterator. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: BigQueryCheckOperator skip value and error check in deferrable mode [airflow]

2024-03-22 Thread via GitHub


potiuk commented on code in PR #38408:
URL: https://github.com/apache/airflow/pull/38408#discussion_r1535855822


##
airflow/providers/google/cloud/operators/bigquery.py:
##
@@ -322,8 +322,26 @@ def execute(self, context: Context):
 ),
 method_name="execute_complete",
 )
+self._handle_job_error(job)
+# job.result() returns a RowIterator. Mypy expects an instance of 
SupportsNext[Any] for
+# the next() call which the RowIterator does not resemble to. 
Hence, ignore the arg-type error.
+records = next(job.result())  # type: ignore[arg-type]

Review Comment:
   Does'nt `next(job.result())` return single Row (if job.result() returnes 
RowIterator)? And validate_records expect Iterator ? (or maybe I am missing 
somethong)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org