Re: [I] Annotate a Dataset Event in the Source Task [airflow]

2024-04-07 Thread via GitHub
uranusjr closed issue #37810: Annotate a Dataset Event in the Source Task URL: https://github.com/apache/airflow/issues/37810 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Annotate a Dataset Event in the Source Task [airflow]

2024-03-31 Thread via GitHub
uranusjr commented on issue #37810: URL: https://github.com/apache/airflow/issues/37810#issuecomment-2028991227 Core mechanism to set `DatasetEvent.extra` is implemented in #38481. I’ll move to implementing `yield Metadata(...)` from a task next. This might take a while since I can see how

Re: [I] Annotate a Dataset Event in the Source Task [airflow]

2024-03-23 Thread via GitHub
jscheffl commented on issue #37810: URL: https://github.com/apache/airflow/issues/37810#issuecomment-2016560016 > This also opens the door for sending multiple things from one single function if we allow `yield Output(...)`. I can think of future extensions that the return value does not go

Re: [I] Annotate a Dataset Event in the Source Task [airflow]

2024-03-21 Thread via GitHub
uranusjr commented on issue #37810: URL: https://github.com/apache/airflow/issues/37810#issuecomment-2014286445 I gave this a pretty long thought. I am leaning to implementing the `return Metadata(...)` syntax mentioned above, but with a little flair to solve the issue it conflicts with XCo

Re: [I] Annotate a Dataset Event in the Source Task [airflow]

2024-03-12 Thread via GitHub
uranusjr commented on issue #37810: URL: https://github.com/apache/airflow/issues/37810#issuecomment-1992043317 > if task return value (==XCom) shall be taken over as `extra` event data. So if the marker is set, the return value goes to the dataset event’s extra, _instead of_ (not in

Re: [I] Annotate a Dataset Event in the Source Task [airflow]

2024-03-07 Thread via GitHub
jscheffl commented on issue #37810: URL: https://github.com/apache/airflow/issues/37810#issuecomment-1984323146 > With that established, if we store extra metadata (of a dataset), it only makes sense to allow extra metadata also when an XCom is written. But if we use XCom for the extra,

Re: [I] Annotate a Dataset Event in the Source Task [airflow]

2024-03-07 Thread via GitHub
uranusjr commented on issue #37810: URL: https://github.com/apache/airflow/issues/37810#issuecomment-1982982809 Since XCom is just a data storage, it can be used like an external S3 file, or a database the user sets up. It is just a bit more automated and contains some metadata. I feel it i

Re: [I] Annotate a Dataset Event in the Source Task [airflow]

2024-03-05 Thread via GitHub
jscheffl commented on issue #37810: URL: https://github.com/apache/airflow/issues/37810#issuecomment-1979621553 > I like the idea. How would this work if the task writes to more than one dataset though? I believe might be an option as extension to also be able to pick which XCom as a

Re: [I] Annotate a Dataset Event in the Source Task [airflow]

2024-03-04 Thread via GitHub
uranusjr commented on issue #37810: URL: https://github.com/apache/airflow/issues/37810#issuecomment-1977590101 I like the idea. How would this work if the task writes to more than one dataset though? Another thing I’ve been thinking is to give XCom a dataset URI so we can track line

Re: [I] Annotate a Dataset Event in the Source Task [airflow]

2024-03-04 Thread via GitHub
jscheffl commented on issue #37810: URL: https://github.com/apache/airflow/issues/37810#issuecomment-1977331326 Hi @uranusjr I was thinking of the same/similar feature like many many weeks - especially in data driven use cases. We also have a DAG that potentially generates dataset events -

Re: [I] Annotate a Dataset Event in the Source Task [airflow]

2024-03-01 Thread via GitHub
jedcunningham commented on issue #37810: URL: https://github.com/apache/airflow/issues/37810#issuecomment-1973379755 LGTM, looking forward to this 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[I] Annotate a Dataset Event in the Source Task [airflow]

2024-02-29 Thread via GitHub
uranusjr opened a new issue, #37810: URL: https://github.com/apache/airflow/issues/37810 ### Description To eventually support the construct and UI we’re aiming for in assets, we need to attach metadata to the actual data, not the task that produces it, nor the location it is written