Re: [PR] [OpenLineage] Fix datasets in GCSTimeSpanFileTransformOperator [airflow]

2024-04-17 Thread via GitHub


mobuchowski merged PR #39064:
URL: https://github.com/apache/airflow/pull/39064


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [OpenLineage] Fix datasets in GCSTimeSpanFileTransformOperator [airflow]

2024-04-16 Thread via GitHub


kacpermuda opened a new pull request, #39064:
URL: https://github.com/apache/airflow/pull/39064

   
   
   
   
   Currently we are including all files as datasets which can lead to 
increasing the size of the event and make matching datasets between jobs harder.
   
   With that change, we are using prefixes from the user as dataset names and 
not full file paths. This way, user can easily control the size of the event 
and also ensure proper matching, when the same two prefixes are passed to 
different operators. I am also removing the list of files that was saved for 
the purpose of lineage datasets, introduced in #35838 .
   
   
   
   ---
   **^ Add meaningful description above**
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a 
newsfragment file, named `{pr_number}.significant.rst` or 
`{issue_number}.significant.rst`, in 
[newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org