[GitHub] [airflow] Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution

2019-10-29 Thread GitBox
Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly 
before execution
URL: https://github.com/apache/airflow/pull/6370#issuecomment-547280956
 
 
   > Unless I switch a pickle flag it complains that it is not JSON 
serializable, the comments in the code indicate pickling is going away for 
Airflow 2.0. I'm trying to ease my migration ahead of time :).
   
   @notatallshaw  I think we should be able to serialize datetimes, could you 
open a ticket for this?
   
   @KevinYang21  No worries, I'd rather have a late reply than no reply. This 
has been reverted anyway and I love your input on it.
   
   > XCOM was designed for inter-task communication and should stay in that way.
   
   This is obvious since before there wasn't really a need to have inter-task 
communication. Let me say that I'm not trying to break idempotency, and we 
should have it by design. However, for efficiency, you might want to cache some 
state, and this can still be idempotent.
   
   My case is that it just feels awkward to create a second table that does 
exactly the same as xcom. What would your opinion on adding an `include_self` 
option to the `include_prior_dates` method? Any I was thinking of adding a 
column to xcom, called `persistent`, which indicates that it shouldn't be 
cleared.
   
   It is evident that I did not think this through, and did not cover all the 
edge cases. I'll try to open up some PR's in the upcoming days to at least 
improve on atomicity and make changes in small steps, I'll include you on the 
PR's as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution

2019-10-27 Thread GitBox
Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly 
before execution
URL: https://github.com/apache/airflow/pull/6370#issuecomment-546707884
 
 
   First of all, you should be able to just serialize the execution_date. 
Instead of converting it to a string, and parse it again.
   
   I don't think your example is complete. What would be the value of the 
`execution_date` for each of the runs? There are two options:
   - They are the same. So on the first run, the `old_xcom` will be None, 
because it never ran before, and it will raise the error. On the second run, 
they will be equal, so the exception will not be raise. Same for the third run.
   - They are monotonically increasing. For example, a daily run. Then the 
first run will still error, and the second run will error as well since the 
first xcom will be smaller than the second.
   
   Bolke triggered me, in combination with your comment about the use of 
`include_prior_dates`.  When you set this option to `True`, which is `False` by 
default, then the `xcom_pull` won't be idempotent anymore for the same 
taskinstance. Which it definitly should. So I'm thinking of making this 
configurable. By adding an option to the `xcom_pull`, for example, 
`include_self` (working name), which implies that it will also include earlier 
xcom's of itself.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution

2019-10-27 Thread GitBox
Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly 
before execution
URL: https://github.com/apache/airflow/pull/6370#issuecomment-546676541
 
 
   Just for the record. @bolkedebruin technically you're correct. But I was 
being pragmatic. We don't use the `id` column, so that's why I've dropped it. 
The migration will look like:
   1. Drop the existing index and PK
   1. Drop the id column
   1. Create the PK on the fields on which the index was before
   Recreating such an index can take quite a lot of time, so just using the 
existing table as is, would be in practice exactly the same. Only the newly 
initiated installations would not have any `id` column, which isn't used 
anywhere  However technically, I know, you should not touch existing 
migrations.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution

2019-10-27 Thread GitBox
Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly 
before execution
URL: https://github.com/apache/airflow/pull/6370#issuecomment-546672430
 
 
   @notatallshaw I'm pretty sure that this behavior hasn't changed. Before it 
would be cleared, and then you would insert the xcom keys again. Again the only 
thing is that we've changed is instead of clearing them upfront, we'll just 
overwrite the keys.
   
   @bolkedebruin Will do


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution

2019-10-26 Thread GitBox
Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly 
before execution
URL: https://github.com/apache/airflow/pull/6370#issuecomment-546640879
 
 
   > So what happens if the task ran successfully first and then was cleared? 
Does that clear XCOM. It should IMHO.
   
   IMHO it should as well, and it does: 
https://github.com/apache/airflow/blob/master/airflow/models/xcom.py#L103-L107
   
   Idempotency is guaranteed, and it is still as idempotent as before. It is 
even more atomic than before, as it is now being done in a single transaction. 
The only chance is that it is done at the end of the run, instead of the 
beginning.
   
   > Btw: am I correct that an alembic migration script was edited rather than 
a new migration was created? That will not work for upgrades.
   
   The change is fully backward compatible. The only chance is getting rid of 
the `id` column because that one wasn't used anywhere in the code. The index on 
there has been replaced by the primary key.
   
   > Also what happened to squashed commits?
   
   There is a new feature in Github that we use `squash and merge`, this will 
make sure that there is only one commit in the target branch. In this case:
   
https://github.com/apache/airflow/commit/74d2a0d9e77cf90b85654f65d1adba0875e0fb1f
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution

2019-10-26 Thread GitBox
Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly 
before execution
URL: https://github.com/apache/airflow/pull/6370#issuecomment-546582724
 
 
   The idea started with PR https://github.com/apache/airflow/pull/6210
   
   We needed to store state because we're now having a sensor that allows you 
to give back the slot, and repoke again later on. However, sometimes you need 
to keep state. For example, when you are poking a Dataproc operator, and 
waiting for it to get started, you want to keep the ID of the operator. I've 
suggested this to store this in xcom. Xcom is initially intended to share data 
between operators/senors, but in this case, I think it would be great to also 
use it to share state between operator/sensor instances. It is already in 
there, and you can also look at it through the GUI. If you clear the xcom 
fields upfront, then the state would have been wiped before the next instance 
of the operator/sensor would run.
   
   I've poked Bas and we've had a discussion about why this was in there, and 
what the implications are when doing an upsert of the xcom fields, instead of 
clearing them upfront. We could not come to an obvious reason. To give some 
history, it was initially added 4 yours ago: 
https://github.com/apache/airflow/commit/f238f1d614061573fca48817cbf5314c772d12d2.
 And then it was moved to later in the task process: 
https://github.com/apache/airflow/pull/1951 
   
   Personally I don't expect the user to notice a lot since it will only keep 
the xcom values a little longer.
   
   
![image](https://user-images.githubusercontent.com/1134248/67616669-cc5c2480-f7db-11e9-86d1-89dfcdca7a3a.png)
   
   So there is no async part in there AFAIK, but the clearing of the xcom is 
delayed until it is upserted (atomic :-).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution

2019-10-24 Thread GitBox
Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly 
before execution
URL: https://github.com/apache/airflow/pull/6370#issuecomment-545838956
 
 
   Thank you @OmerJog 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution

2019-10-22 Thread GitBox
Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly 
before execution
URL: https://github.com/apache/airflow/pull/6370#issuecomment-545123697
 
 
   Actually, the current database layout is backward compatible. We don't use 
the `id` field, and I don't see too much value in writing a complicated 
migration script (since sqlite doesn't support removing columns, and also 
removing the PK's can be tricky in alembic). Ready for review :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution

2019-10-22 Thread GitBox
Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly 
before execution
URL: https://github.com/apache/airflow/pull/6370#issuecomment-545043480
 
 
   Test failure looks unrelated, pulled in master:
   ```
  Summary of failed tests
   
tests.gcp.hooks.test_gcs.TestGoogleCloudStorageHookUpload.test_upload_data_str_gzip
 Failure:builtins.AssertionError
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution

2019-10-20 Thread GitBox
Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly 
before execution
URL: https://github.com/apache/airflow/pull/6370#issuecomment-544229682
 
 
   Yes. I also still have to do the database migrations. The issue here is that 
for sqlite the migrations are quite limited, and don't support dropping of 
columns. The changes to the xcom table are not breaking any queries, since 
we're not using the `id` field anywhere, and hopefully, the PK uniqueness 
constraints aren't violated anywhere :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services