[GitHub] [airflow] Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution
Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution URL: https://github.com/apache/airflow/pull/6370#issuecomment-547280956 > Unless I switch a pickle flag it complains that it is not JSON serializable, the comments in the code indicate pickling is going away for Airflow 2.0. I'm trying to ease my migration ahead of time :). @notatallshaw I think we should be able to serialize datetimes, could you open a ticket for this? @KevinYang21 No worries, I'd rather have a late reply than no reply. This has been reverted anyway and I love your input on it. > XCOM was designed for inter-task communication and should stay in that way. This is obvious since before there wasn't really a need to have inter-task communication. Let me say that I'm not trying to break idempotency, and we should have it by design. However, for efficiency, you might want to cache some state, and this can still be idempotent. My case is that it just feels awkward to create a second table that does exactly the same as xcom. What would your opinion on adding an `include_self` option to the `include_prior_dates` method? Any I was thinking of adding a column to xcom, called `persistent`, which indicates that it shouldn't be cleared. It is evident that I did not think this through, and did not cover all the edge cases. I'll try to open up some PR's in the upcoming days to at least improve on atomicity and make changes in small steps, I'll include you on the PR's as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution
Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution URL: https://github.com/apache/airflow/pull/6370#issuecomment-546707884 First of all, you should be able to just serialize the execution_date. Instead of converting it to a string, and parse it again. I don't think your example is complete. What would be the value of the `execution_date` for each of the runs? There are two options: - They are the same. So on the first run, the `old_xcom` will be None, because it never ran before, and it will raise the error. On the second run, they will be equal, so the exception will not be raise. Same for the third run. - They are monotonically increasing. For example, a daily run. Then the first run will still error, and the second run will error as well since the first xcom will be smaller than the second. Bolke triggered me, in combination with your comment about the use of `include_prior_dates`. When you set this option to `True`, which is `False` by default, then the `xcom_pull` won't be idempotent anymore for the same taskinstance. Which it definitly should. So I'm thinking of making this configurable. By adding an option to the `xcom_pull`, for example, `include_self` (working name), which implies that it will also include earlier xcom's of itself. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution
Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution URL: https://github.com/apache/airflow/pull/6370#issuecomment-546676541 Just for the record. @bolkedebruin technically you're correct. But I was being pragmatic. We don't use the `id` column, so that's why I've dropped it. The migration will look like: 1. Drop the existing index and PK 1. Drop the id column 1. Create the PK on the fields on which the index was before Recreating such an index can take quite a lot of time, so just using the existing table as is, would be in practice exactly the same. Only the newly initiated installations would not have any `id` column, which isn't used anywhere However technically, I know, you should not touch existing migrations. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution
Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution URL: https://github.com/apache/airflow/pull/6370#issuecomment-546672430 @notatallshaw I'm pretty sure that this behavior hasn't changed. Before it would be cleared, and then you would insert the xcom keys again. Again the only thing is that we've changed is instead of clearing them upfront, we'll just overwrite the keys. @bolkedebruin Will do This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution
Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution URL: https://github.com/apache/airflow/pull/6370#issuecomment-546640879 > So what happens if the task ran successfully first and then was cleared? Does that clear XCOM. It should IMHO. IMHO it should as well, and it does: https://github.com/apache/airflow/blob/master/airflow/models/xcom.py#L103-L107 Idempotency is guaranteed, and it is still as idempotent as before. It is even more atomic than before, as it is now being done in a single transaction. The only chance is that it is done at the end of the run, instead of the beginning. > Btw: am I correct that an alembic migration script was edited rather than a new migration was created? That will not work for upgrades. The change is fully backward compatible. The only chance is getting rid of the `id` column because that one wasn't used anywhere in the code. The index on there has been replaced by the primary key. > Also what happened to squashed commits? There is a new feature in Github that we use `squash and merge`, this will make sure that there is only one commit in the target branch. In this case: https://github.com/apache/airflow/commit/74d2a0d9e77cf90b85654f65d1adba0875e0fb1f This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution
Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution URL: https://github.com/apache/airflow/pull/6370#issuecomment-546582724 The idea started with PR https://github.com/apache/airflow/pull/6210 We needed to store state because we're now having a sensor that allows you to give back the slot, and repoke again later on. However, sometimes you need to keep state. For example, when you are poking a Dataproc operator, and waiting for it to get started, you want to keep the ID of the operator. I've suggested this to store this in xcom. Xcom is initially intended to share data between operators/senors, but in this case, I think it would be great to also use it to share state between operator/sensor instances. It is already in there, and you can also look at it through the GUI. If you clear the xcom fields upfront, then the state would have been wiped before the next instance of the operator/sensor would run. I've poked Bas and we've had a discussion about why this was in there, and what the implications are when doing an upsert of the xcom fields, instead of clearing them upfront. We could not come to an obvious reason. To give some history, it was initially added 4 yours ago: https://github.com/apache/airflow/commit/f238f1d614061573fca48817cbf5314c772d12d2. And then it was moved to later in the task process: https://github.com/apache/airflow/pull/1951 Personally I don't expect the user to notice a lot since it will only keep the xcom values a little longer. ![image](https://user-images.githubusercontent.com/1134248/67616669-cc5c2480-f7db-11e9-86d1-89dfcdca7a3a.png) So there is no async part in there AFAIK, but the clearing of the xcom is delayed until it is upserted (atomic :-). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution
Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution URL: https://github.com/apache/airflow/pull/6370#issuecomment-545838956 Thank you @OmerJog This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution
Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution URL: https://github.com/apache/airflow/pull/6370#issuecomment-545123697 Actually, the current database layout is backward compatible. We don't use the `id` field, and I don't see too much value in writing a complicated migration script (since sqlite doesn't support removing columns, and also removing the PK's can be tricky in alembic). Ready for review :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution
Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution URL: https://github.com/apache/airflow/pull/6370#issuecomment-545043480 Test failure looks unrelated, pulled in master: ``` Summary of failed tests tests.gcp.hooks.test_gcs.TestGoogleCloudStorageHookUpload.test_upload_data_str_gzip Failure:builtins.AssertionError ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution
Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution URL: https://github.com/apache/airflow/pull/6370#issuecomment-544229682 Yes. I also still have to do the database migrations. The issue here is that for sqlite the migrations are quite limited, and don't support dropping of columns. The changes to the xcom table are not breaking any queries, since we're not using the `id` field anywhere, and hopefully, the PK uniqueness constraints aren't violated anywhere :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services