Re: [PR] Informatica provider [airflow]

2026-02-14 Thread via GitHub


jscheffl commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3902073567

   Besides the optional extra for Airflow core, there is actually no real core 
dependency. So I'd rather not back-port, which would just miss the convenience 
of being able to install `apache-airflow[informatica]==3.1.8` - if somebody 
think this should be made, let me know then I can backport. Otherwise will be 
possible from 3.2.0 onwards.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-14 Thread via GitHub


github-actions[bot] commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3902068883

   ### Backport failed to create: v3-1-test. View the failure log  Run details 

   
   
   
   Status
   Branch
   Result
   
   
   ❌
   v3-1-test
   https://github.com/apache/airflow/commit/23e980d3b23a35fbbb33509c6ca6df1073159397";>
   
   
   
   You can attempt to backport this manually by running:
   
   ```bash
   cherry_picker 23e980d v3-1-test
   ```
   
   This should apply the commit to the v3-1-test branch and leave the commit in 
conflict state marking
   the files that need manual conflict resolution.
   
   After you have resolved the conflicts, you can continue the backport process 
by running:
   
   ```bash
   cherry_picker --continue
   ```
   
   If you don't have cherry-picker installed, see the [installation 
guide](https://github.com/apache/airflow/blob/main/dev/README_AIRFLOW3_DEV.md#how-to-backport-pr-with-cherry-picker-cli).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-14 Thread via GitHub


jscheffl merged PR #57610:
URL: https://github.com/apache/airflow/pull/57610


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-14 Thread via GitHub


jscheffl commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3902067424

   Thanks for all the efforts and being patient! Let's get is rolling!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-14 Thread via GitHub


boring-cyborg[bot] commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3902067585

   Awesome work, congrats on your first merged pull request! You are invited to 
check our [Issue Tracker](https://github.com/apache/airflow/issues) for 
additional contributions.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-14 Thread via GitHub


cetingokhan commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3901980990

   We finally saw the green :), thank you everyone for your support from the 
beginning, it was a wonderful experience for me too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-14 Thread via GitHub


potiuk commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3901857925

   I rerun the failed check to be sure 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-14 Thread via GitHub


jscheffl commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3901711841

   > The recent errors don't seem to be related to Informatica. Is it mandatory 
for all tests to pass for merge?
   
   No not needed but would be better for confidence. Last run just had a DB 
timeout... but last week there were still syntax errors.
   
   @potiuk You still have "request changes", once turning green, devlist 
discussionw as positive, can you re-review from your current response. In my 
view LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-13 Thread via GitHub


cetingokhan commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3901107476

   Hi @jscheffl,
   
   The recent errors don't seem to be related to Informatica. Is it mandatory 
for all tests to pass for merge?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-10 Thread via GitHub


cetingokhan commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2790508798


##
providers/informatica/tests/unit/informatica/operators/__init__.py:
##


Review Comment:
   Yes, you're right, file added by hook. I deleted it and resent it again. 
   In the end, it seems there are no errors originating from the Informatica 
provider.
   
   Should I update the Breeze documentation as the main branch changes and 
merges are performed, or should I wait until the last minute?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-10 Thread via GitHub


cetingokhan commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2790508798


##
providers/informatica/tests/unit/informatica/operators/__init__.py:
##


Review Comment:
   Yes, you're right, I added a hook, then deleted it and resent it. In the 
end, it seems there are no errors originating from the Informatica provider.
   
   Should I update the Breeze documentation as the main branch changes and 
merges are performed, or should I wait until the last minute?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-10 Thread via GitHub


jscheffl commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2790033615


##
providers/informatica/tests/unit/informatica/operators/__init__.py:
##


Review Comment:
   Now a new empty folder was added and probably due to a prek hook a 
__init__.py was added. I assume as long as no operator and therefore no 
operator test the folder and __init__.py can be removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-09 Thread via GitHub


jscheffl commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3874060504

   Cassandra diver problems leading to failures in CI seem to be unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-08 Thread via GitHub


jscheffl commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3867895099

   Note that build most probably fail on broken main which is fixed in parallel 
in https://github.com/apache/airflow/pull/61651


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-08 Thread via GitHub


jscheffl commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2779405715


##
.github/CODEOWNERS:
##
@@ -98,6 +98,7 @@ airflow-core/src/airflow/ui/public/i18n/locales/zh-TW/ @Lee-W 
@jason810496 @guan
 /providers/openlineage/ @mobuchowski
 /providers/smtp/ @hussein-awala
 /providers/snowflake/ @potiuk
+/providers/informatica/ @cetingokhan @RNHTTR @sertaykabuk @umutozel

Review Comment:
   Unfortunately only committers from ASF can be added. All stewards which are 
non-committers can not be added :-( Technical limitation. Can you make it like 
we do it with translations that other names are listed in the comments?
   ```suggestion
   /providers/informatica/ @RNHTTR # + @cetingokhan @sertaykabuk @umutozel
   ```
   
   Additionally would be cool to sort it by alphabeth so line 96 would be 
better :-D



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-08 Thread via GitHub


cetingokhan commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3867326051

   Hi @jscheffl,
   
   I have addressed all your other review comments in the latest commit.
   
   I left the conversation regarding the state field open for now, as I am 
waiting for your feedback on my previous reply (about the MVP release 
approach). Once we align on that, I will resolve that thread as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-08 Thread via GitHub


cetingokhan commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2779347997


##
providers/informatica/tests/integration/informatica/operators/__init__.py:
##


Review Comment:
   folders removed completely!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-08 Thread via GitHub


cetingokhan commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2779346932


##
providers/informatica/docs/index.rst:
##
@@ -0,0 +1,180 @@
+
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+
+``apache-airflow-providers-informatica``
+
+
+.. toctree::
+   :hidden:
+   :maxdepth: 1
+   :caption: Basics
+
+   Home 
+   Security 
+   Changelog 
+
+.. toctree::
+   :hidden:
+   :maxdepth: 1
+   :caption: Guides
+
+   Usage 
+   API Reference 
+   Configuration 
+
+.. toctree::
+   :hidden:
+   :maxdepth: 1
+   :caption: References
+
+   Configuration 
+   Python API <_api/airflow/providers/informatica/index>
+
+.. toctree::
+:hidden:
+:maxdepth: 1
+:caption: Resources
+
+PyPI Repository 

+Installing from sources 
+
+.. toctree::
+:hidden:
+:maxdepth: 1
+:caption: System tests
+
+System Tests <_api/tests/system/informatica/index>
+
+.. toctree::
+:hidden:
+:maxdepth: 1
+:caption: Commits
+
+Detailed list of commits 
+
+Apache Airflow Informatica Provider
+===
+
+**Note:** This provider is not officially maintained or endorsed by 
Informatica. It is a community-developed integration for Apache Airflow.

Review Comment:
   Reaching out to Informatica after the first release was already my priority, 
so getting them involved is definitely the goal. You are right that the 
disclaimer could give them an excuse to avoid responsibility. Removing it makes 
perfect sense. Thanks for the insight!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-08 Thread via GitHub


jscheffl commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2779331331


##
providers/informatica/tests/integration/informatica/operators/__init__.py:
##


Review Comment:
   I was referring to the point that there are ZERO system and ZERO integration 
tests, so the whole "system" and "integration" tree can be removed including 
the existing __init__.py.
   
   See other providers which also do not implement integration or system tests 
also do not carry these folders.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-08 Thread via GitHub


potiuk commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2779287284


##
providers/informatica/tests/integration/informatica/operators/__init__.py:
##


Review Comment:
   I think in this case we **could** use implicit namespaces, but if we do - we 
should do it consistently.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-08 Thread via GitHub


potiuk commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2779286601


##
providers/informatica/tests/integration/informatica/operators/__init__.py:
##


Review Comment:
   I think they are added automatically and needed. There is a subtle problem 
with "implicit" namespaces - that they do not play well with "Explicit" ones 
and we have rules in `prek` hooks that enforce a consistent pattern. The 
current pattern is that we use namespaces when we **know** we need them and 
then we use **legacy** namespace definition (`__path__ = 
__import__("pkgutil").extend_path(__path__, __name__`) where we know we must 
have them (for example aill "airflow.providers" folders shoudl have legacy 
namespaces, otherwise other providers will not be discovered.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-08 Thread via GitHub


potiuk commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2779282154


##
providers/informatica/docs/index.rst:
##
@@ -0,0 +1,180 @@
+
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+
+``apache-airflow-providers-informatica``
+
+
+.. toctree::
+   :hidden:
+   :maxdepth: 1
+   :caption: Basics
+
+   Home 
+   Security 
+   Changelog 
+
+.. toctree::
+   :hidden:
+   :maxdepth: 1
+   :caption: Guides
+
+   Usage 
+   API Reference 
+   Configuration 
+
+.. toctree::
+   :hidden:
+   :maxdepth: 1
+   :caption: References
+
+   Configuration 
+   Python API <_api/airflow/providers/informatica/index>
+
+.. toctree::
+:hidden:
+:maxdepth: 1
+:caption: Resources
+
+PyPI Repository 

+Installing from sources 
+
+.. toctree::
+:hidden:
+:maxdepth: 1
+:caption: System tests
+
+System Tests <_api/tests/system/informatica/index>
+
+.. toctree::
+:hidden:
+:maxdepth: 1
+:caption: Commits
+
+Detailed list of commits 
+
+Apache Airflow Informatica Provider
+===
+
+**Note:** This provider is not officially maintained or endorsed by 
Informatica. It is a community-developed integration for Apache Airflow.

Review Comment:
   ```suggestion
   ```
   
   I think we should avoid such disclaimers. It's clear that providers released 
by the ASF are released by the ASF (as all code released by ASF) but I do not 
see a problem if Informatica people are also called when there is an issue. We 
still need to add the CODEOWNERS entry that @jscheffl was talking about - and 
while now it could be 3rd-party, but it would be great if eventually 
informatica people will sign up as stewards  and it would be great if people 
actually call informatica if there are some problems with the provider - 
becasue it's them who should be incentivised in solving any problems
   
   As we agreeed in our process, PMC mostly will monitor if the provider gets 
enough of care from whoever the stakeholder is - and we even **want** that 
Informatica people get involved. Such disclaimer is against this "wish" we have 
- and it might mean that Informatica team will not get engaged on the base of 
such disclaimer. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-08 Thread via GitHub


cetingokhan commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2779267543


##
providers/informatica/provider.yaml:
##
@@ -0,0 +1,73 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+---
+package-name: apache-airflow-providers-informatica
+name: Informatica Airflow
+description: |
+  `Informatica `__
+
+state: ready

Review Comment:
   This is an initial version that covers the basic Informatica integration. 
Since the current features are stable and working, I believe it is ready for 
release.
   Also, releasing it now will help us catch real-world issues that might be 
missed in development.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-08 Thread via GitHub


jscheffl commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3866934154

   As dicussed some remarks in .github/CODEOWNERS should be made until other 
structures to document sponsor and steward.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-08 Thread via GitHub


jscheffl commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2779095734


##
providers/informatica/tests/integration/informatica/operators/__init__.py:
##


Review Comment:
   I assume the structure of empty test folders with just an __init__.py can be 
cleaned. It only needs to be there if there is some real tests.
   
   I'd propose to remove the "system" and "integration" folders completey.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-08 Thread via GitHub


jscheffl commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2779087521


##
providers/informatica/provider.yaml:
##
@@ -0,0 +1,73 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+---
+package-name: apache-airflow-providers-informatica
+name: Informatica Airflow
+description: |
+  `Informatica `__
+
+state: ready

Review Comment:
   DO providers that enter incubation start in state "ready"? I asusme yes, 
just wanted to double-check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-08 Thread via GitHub


jscheffl commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2779083338


##
providers/informatica/src/airflow/providers/informatica/plugins/informatica.py:
##
@@ -0,0 +1,45 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+from airflow.configuration import conf
+from airflow.plugins_manager import AirflowPlugin
+
+is_disabled = conf.getboolean("informatica", "disabled", fallback=False)
+# Conditional imports - only load expensive dependencies when plugin is enabled
+if not is_disabled:
+from airflow.lineage.hook import HookLineageReader
+from airflow.providers.informatica.plugins.listener import 
get_informatica_listener
+
+
+class InformaticaProviderPlugin(AirflowPlugin):
+"""
+Listener that emits numerous Events.
+
+Informatica Plugin provides listener that emits OL events on DAG start,
+complete and failure and TaskInstances start, complete and failure.
+"""
+
+name: str = "InformaticaProviderPlugin"
+
+listeners: list = [get_informatica_listener()] if not is_disabled else []
+hook_lineage_readers: list = [HookLineageReader] if not is_disabled else []
+
+
+def __init__(self) -> None:
+"""Initialize InformaticaProviderPlugin."""
+super().__init__()

Review Comment:
   Mhm, comment was resolved but code is still in?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-08 Thread via GitHub


jscheffl commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3866899874

   Oh, unfortunately a parallel PR was merged that also re-generated the 
outputs of breeze - can you run `prek run -a update-breeze-cmd-output` and 
re-commit the changes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-07 Thread via GitHub


cetingokhan commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3866063947

   I have addressed all the CI/static check failures, and I expect all tests to 
pass now.
   
   I had to make a slight adjustment to the OpenLineage tests to resolve a 
conflict. I would really appreciate it if @mobuchowski could review that 
specific change to ensure it is correct.
   
   This has been a challenging but very educational journey for me. Thank you 
again for your patience and support!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-07 Thread via GitHub


potiuk commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3864610843

   PR fixing the "not-released-yet-provider" case 
https://github.com/apache/airflow/pull/61597


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-07 Thread via GitHub


potiuk commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3864557303

   > With the static checks I am not 100% sure. It might be a new "chicken and 
egg" problem. @potiuk do you know why version is removed? I assume a follow-up 
PR after merging is needed to add the version number.
   
   Yeah. Currently we have those min versions based on some cut-off-time and we 
are committed to bump it regularly to avoid some of the `pip` resolution issues 
- when too many versions of providers is considered. See 
`scripts/ci/prek/update_airflow_pyproject_toml.py`
   
   The cut-off date now is set to `2024-10-12T00:00:00Z` and we might want to 
bump it with 3.2 to something in 2025.  This is a `soft` way to make people to 
not use older versions of providers by default, without hard-blocking them.
   
   We - apparently - do not have a process now to handle well when provider is 
**not** yet released at all and I will PR a fix shortly - but in the meantime, 
you might want to add `informatica` to `MIN_VERSION_OVERRIDE` in that file 
@cetingokhan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-07 Thread via GitHub


jscheffl commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3864351682

   > Static Checks / Pre-commit: When I run pre-commit locally, it modifies 
some breeze files and removes the version information for the Informatica 
provider in pyproject.toml. I was hesitant to include these auto-generated 
changes in my commit. Is this behavior expected? Should I commit these changes?
   > 
   > OpenLineage Test: The 
test_extractor_manager_calls_appropriate_extractor_method in OpenLineage tests 
is failing. I am not sure if my provider triggered this issue. Should I update 
the assertion in that test to fix it?
   > 
   > Core Plugin Test: The TestGetPlugins::test_should_respond_200 test is 
failing because the total plugin count has changed (due to my new plugin). I 
tried to avoid modifying core files, but in this case, should I update the test 
to reflect the new plugin count?
   
   With the static checks I am not 100% sure. It might be a new "chicken and 
egg" problem. @potiuk do you know why version is removed? I assume a follow-up 
PR after merging is needed to add the version number.
   
   Regarding core tests you most probably adjust the counts to get tests green. 
I would see an improvement of CI not having dependencies to current provider 
source tree outside of this PR and as a general improvement of CI needed.
   
   With OpenLineage I am not sure, I am not in this topic :-(


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-07 Thread via GitHub


cetingokhan commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3864327230

   Hi @jscheffl  and @RNHTTR,
   
   I see three different CI errors and I wanted to ask for your guidance before 
pushing new changes:
   
   Static Checks / Pre-commit: When I run pre-commit locally, it modifies some 
breeze files and removes the version information for the Informatica provider 
in pyproject.toml. I was hesitant to include these auto-generated changes in my 
commit. Is this behavior expected? Should I commit these changes?
   
   OpenLineage Test: The 
test_extractor_manager_calls_appropriate_extractor_method in OpenLineage tests 
is failing. I am not sure if my provider triggered this issue. Should I update 
the assertion in that test to fix it?
   
   Core Plugin Test: The TestGetPlugins::test_should_respond_200 test is 
failing because the total plugin count has changed (due to my new plugin). I 
tried to avoid modifying core files, but in this case, should I update the test 
to reflect the new plugin count?
   
   Thanks for your help!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-04 Thread via GitHub


kaxil commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3850493683

   > Cool! Thanks @RNHTTR for raising your hand. Then most probably a pitch to 
devlist is the next step, maybe some additional sponsor shows-up. And some 
minor nit then you could the the first provider here following the new 
contribution model!
   > 
   > @kaxil / @vikramkoka Where to we register the "sponsorship" and "steward" 
roles? In CODEOWNERS like translations or where to place the names?
   
   Yeah for now CODEOWNERS is fine. We can keep a running record in 
https://github.com/apache/airflow/blob/main/PROVIDERS.rst too -- just a simple 
table for new providers for now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-03 Thread via GitHub


Copilot commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2761728647


##
providers/informatica/provider.yaml:
##
@@ -0,0 +1,73 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+---
+package-name: apache-airflow-providers-informatica
+name: Informatica Airflow
+description: |
+  `Informatica `__
+
+state: ready
+source-date-epoch: 1758787152
+# Note that those versions are maintained by release manager - do not update 
them manually
+# with the exception of case where other provider in sources has >= new 
provider version.
+# In such case adding >= NEW_VERSION and bumping to NEW_VERSION in a provider 
have
+# to be done in the same PR
+versions:
+  - 0.1.0
+
+integrations:
+  - integration-name: Informatica
+external-doc-url: https://www.informatica.com/
+logo: /docs/integration-logos/informatica.png
+tags: [protocol]
+
+hooks:
+  - integration-name: Informatica
+python-modules:
+  - airflow.providers.informatica.hooks.edc
+
+connection-types:
+  - hook-class-name: airflow.providers.informatica.hooks.edc.InformaticaEDCHook
+connection-type: informatica_edc
+
+plugins:
+  - name: informatica
+plugin-class: 
airflow.providers.informatica.plugins.InformaticaProviderPlugin
+

Review Comment:
   PR description mentions adding an operator, but this provider metadata 
currently only declares hooks and plugins (no operators are listed here, and 
there is no `operators` module under `providers/informatica/src`). Either add 
the operator(s) and list them under `operators:` or update the PR description 
to match the actual scope.



##
providers/informatica/src/airflow/providers/informatica/plugins/informatica.py:
##
@@ -0,0 +1,43 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+from airflow.configuration import conf
+from airflow.plugins_manager import AirflowPlugin
+
+is_disabled = conf.getboolean("informatica", "disabled", fallback=False)
+# Conditional imports - only load expensive dependencies when plugin is enabled
+if not is_disabled:
+from airflow.providers.common.compat.sdk import HookLineageReader
+from airflow.providers.informatica.plugins.listener import 
get_informatica_listener
+
+
+class InformaticaProviderPlugin(AirflowPlugin):
+"""
+Listener that emits numerous Events.
+
+Informatica Plugin provides listener that emits OL events on DAG start,
+complete and failure and TaskInstances start, complete and failure.
+"""

Review Comment:
   This plugin docstring claims it emits events on DAG start/complete/failure, 
but the plugin only registers a task-instance listener. Please adjust the 
docstring (or add DAG-run event handling) so user-facing docs don’t overpromise 
functionality.



##
airflow-core/tests/unit/plugins/test_plugins_manager.py:
##
@@ -359,4 +359,4 @@ def 
test_does_not_double_import_entrypoint_provider_plugins(self):
 # Mock/skip loading from plugin dir
 with 
mock.patch("airflow.plugins_manager._load_plugins_from_plugin_directory", 
return_value=([], [])):
 plugins = plugins_manager._get_plugins()[0]
-assert len(plugins) == 4
+assert len(plugins) == 5

Review Comment:
   This test only asserts the total number of loaded plugins, which is brittle 
(changes whenever any provider adds/removes a plugin) and doesn’t directly 
verify the “does not doub

Re: [PR] Informatica provider [airflow]

2026-02-03 Thread via GitHub


Copilot commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2761636653


##
pyproject.toml:
##
@@ -438,6 +441,7 @@ packages = []
 "apache-airflow-providers-http>=4.13.2",
 "apache-airflow-providers-imap>=3.8.0",
 "apache-airflow-providers-influxdb>=2.8.0",
+"apache-airflow-providers-informatica",
 "apache-airflow-providers-jdbc>=4.5.2",

Review Comment:
   In the global providers dependency list, 
`apache-airflow-providers-informatica` is added without a minimum version, 
while most other providers are pinned with `>=...`. Consider pinning it 
consistently here as well (e.g. `>=0.1.0`) to avoid resolving an unexpected 
older version once it exists on PyPI.



##
providers/informatica/docs/guides/configuration.rst:
##
@@ -0,0 +1,74 @@
+
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+Configuration
+=
+
+This section describes how to configure the Informatica provider for Apache 
Airflow.
+
+Connection Setup
+
+
+Create an HTTP connection in Airflow for Informatica EDC:
+
+1. **Connection Type**: ``http``
+2. **Host**: Your EDC server hostname
+3. **Port**: EDC server port (typically 9087)
+4. **Schema**: ``https`` or ``http``
+5. **Login**: EDC username

Review Comment:
   The docs instruct users to create an Airflow connection with type `http`, 
but this provider registers a dedicated connection type `informatica_edc` (see 
provider.yaml) and the hook’s `conn_type` is also `informatica_edc`. This 
inconsistency will confuse users and may prevent the UI from showing the right 
fields/hook mapping. Update the docs to reference `informatica_edc` (or, 
alternatively, change the provider’s registered connection type to `http`, but 
then provider.yaml/get_provider_info.py should match).



##
providers/informatica/tests/unit/informatica/extractors/test_informatica.py:
##
@@ -0,0 +1,71 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Unit tests for InformaticaLineageExtractor covering all methods.
+"""
+
+from __future__ import annotations
+
+from unittest.mock import patch
+
+import pytest
+
+from airflow.models import Connection
+from airflow.providers.informatica.extractors.informatica import 
InformaticaLineageExtractor
+from airflow.providers.informatica.hooks.edc import InformaticaEDCHook
+
+
[email protected]
+def extractor():
+informatica_hook = InformaticaEDCHook(informatica_edc_conn_id="test_conn")
+return InformaticaLineageExtractor(edc_hook=informatica_hook)
+
+
[email protected](autouse=True)
+def setup_connections(create_connection_without_db):
+create_connection_without_db(
+Connection(
+conn_id="test_conn",
+conn_type="http",
+host="testhost",
+schema="https",

Review Comment:
   The unit test fixture creates the connection with `conn_type="http"`, but 
this provider registers a dedicated connection type `informatica_edc` 
(provider.yaml/get_provider_info.py) and the hook advertises `conn_type = 
"informatica_edc"`. Using the provider-specific conn_type in tests will better 
reflect real-world configuration and catch any conn-type-specific behavior in 
the UI/metadata.



##
pyproject.toml:
##
@@ -246,6 +246,9 @@ packages = []
 "influxdb" = [
 "apache-airflow-providers-influxdb>=2.8.0"
 ]
+"informatica" = [
+"apache-airflow-providers-informatica"
+]

Review

Re: [PR] Informatica provider [airflow]

2026-02-03 Thread via GitHub


cetingokhan commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2761397549


##
providers/informatica/src/airflow/providers/informatica/plugins/listener.py:
##
@@ -0,0 +1,191 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+import logging
+from typing import TYPE_CHECKING
+
+from airflow import settings
+from airflow.listeners import hookimpl
+from airflow.providers.informatica.extractors import 
InformaticaLineageExtractor
+from airflow.providers.informatica.hooks.edc import InformaticaEDCHook
+
+if TYPE_CHECKING:
+from airflow.models import TaskInstance
+from airflow.utils.state import TaskInstanceState
+
+_informatica_listener: InformaticaListener | None = None
+
+
+def _executor_initializer():
+"""
+Initialize processes for the executor used with DAGRun listener's methods 
(on scheduler).
+
+This function must be picklable, so it cannot be defined as an inner 
method or local function.
+
+Reconfigures the ORM engine to prevent issues that arise when multiple 
processes interact with
+the Airflow database.
+"""
+# This initializer is used only on the scheduler
+# We can configure_orm regardless of the Airflow version, as DB access is 
always allowed from scheduler.
+settings.configure_orm()
+
+
+class InformaticaListener:
+"""Informatica listener sends events on task instance and dag run starts, 
completes and failures."""
+
+def __init__(self):
+self._executor = None
+self.log = logging.getLogger(__name__)
+self.hook = InformaticaLineageExtractor(edc_hook=InformaticaEDCHook())
+# self.extractor_manager = ExtractorManager()
+
+@hookimpl
+def on_task_instance_success(
+self, previous_state: TaskInstanceState, task_instance: TaskInstance, 
*args, **kwargs
+):
+self._handle_lineage(task_instance, state="success")
+
+@hookimpl
+def on_task_instance_failed(
+self, previous_state: TaskInstanceState, task_instance: TaskInstance, 
*args, **kwargs
+):
+self._handle_lineage(task_instance, state="failed")
+
+@hookimpl
+def on_task_instance_running(
+self, previous_state: TaskInstanceState, task_instance: TaskInstance, 
*args, **kwargs
+):
+self._handle_lineage(task_instance, state="running")
+
+def _handle_lineage(self, task_instance: TaskInstance, state: str):
+"""
+Handle lineage resolution for inlets and outlets.
+
+For each inlet and outlet, resolve Informatica EDC object IDs using 
getObject.
+If valid, collect and create lineage links between all valid inlets 
and outlets.
+"""
+task = getattr(task_instance, "task", None)
+if not task:
+self.log.debug("No task found for TaskInstance %s", task_instance)
+return
+inlets = getattr(task, "inlets", getattr(task_instance, "inlets", []))
+outlets = getattr(task, "outlets", getattr(task_instance, "outlets", 
[]))
+
+valid_inlets = []  # List of tuples: (uri, object_id)
+valid_outlets = []
+
+self.log.info("[InformaticaLineageListener] Task: %s State: %s", 
task_instance.task_id, state)
+
+if state != "success":
+self.log.info("[InformaticaLineageListener] Skipping lineage 
handling for state: %s", state)
+return
+
+for inlet in inlets:
+inlet_uri = None
+if isinstance(inlet, dict) and "dataset_uri" in inlet:
+inlet_uri = inlet["dataset_uri"]
+elif isinstance(inlet, str):
+inlet_uri = inlet
+else:
+self.log.error("Inlet is not a string or dict with 
'dataset_uri': %s", inlet)
+continue
+self.log.info("[InformaticaLineageListener] Inlet: %s and type: 
%s", inlet_uri, type(inlet))
+try:
+obj = self.hook.get_object(inlet_uri)
+if obj and "id" in obj and obj["id"]:
+valid_inlets.append((inlet_uri, obj["id"]))
+except Exception as e:
+self.log.exception("Failed to resolve inlet %s: %s", 
inlet_uri

Re: [PR] Informatica provider [airflow]

2026-02-03 Thread via GitHub


jscheffl commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2761163188


##
providers/informatica/src/airflow/providers/informatica/plugins/listener.py:
##
@@ -0,0 +1,191 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+import logging
+from typing import TYPE_CHECKING
+
+from airflow import settings
+from airflow.listeners import hookimpl
+from airflow.providers.informatica.extractors import 
InformaticaLineageExtractor
+from airflow.providers.informatica.hooks.edc import InformaticaEDCHook
+
+if TYPE_CHECKING:
+from airflow.models import TaskInstance
+from airflow.utils.state import TaskInstanceState
+
+_informatica_listener: InformaticaListener | None = None
+
+
+def _executor_initializer():
+"""
+Initialize processes for the executor used with DAGRun listener's methods 
(on scheduler).
+
+This function must be picklable, so it cannot be defined as an inner 
method or local function.
+
+Reconfigures the ORM engine to prevent issues that arise when multiple 
processes interact with
+the Airflow database.
+"""
+# This initializer is used only on the scheduler
+# We can configure_orm regardless of the Airflow version, as DB access is 
always allowed from scheduler.
+settings.configure_orm()
+
+
+class InformaticaListener:
+"""Informatica listener sends events on task instance and dag run starts, 
completes and failures."""
+
+def __init__(self):
+self._executor = None
+self.log = logging.getLogger(__name__)
+self.hook = InformaticaLineageExtractor(edc_hook=InformaticaEDCHook())
+# self.extractor_manager = ExtractorManager()
+
+@hookimpl
+def on_task_instance_success(
+self, previous_state: TaskInstanceState, task_instance: TaskInstance, 
*args, **kwargs
+):
+self._handle_lineage(task_instance, state="success")
+
+@hookimpl
+def on_task_instance_failed(
+self, previous_state: TaskInstanceState, task_instance: TaskInstance, 
*args, **kwargs
+):
+self._handle_lineage(task_instance, state="failed")
+
+@hookimpl
+def on_task_instance_running(
+self, previous_state: TaskInstanceState, task_instance: TaskInstance, 
*args, **kwargs
+):
+self._handle_lineage(task_instance, state="running")
+
+def _handle_lineage(self, task_instance: TaskInstance, state: str):
+"""
+Handle lineage resolution for inlets and outlets.
+
+For each inlet and outlet, resolve Informatica EDC object IDs using 
getObject.
+If valid, collect and create lineage links between all valid inlets 
and outlets.
+"""
+task = getattr(task_instance, "task", None)
+if not task:
+self.log.debug("No task found for TaskInstance %s", task_instance)
+return
+inlets = getattr(task, "inlets", getattr(task_instance, "inlets", []))
+outlets = getattr(task, "outlets", getattr(task_instance, "outlets", 
[]))
+
+valid_inlets = []  # List of tuples: (uri, object_id)
+valid_outlets = []
+
+self.log.info("[InformaticaLineageListener] Task: %s State: %s", 
task_instance.task_id, state)
+
+if state != "success":
+self.log.info("[InformaticaLineageListener] Skipping lineage 
handling for state: %s", state)
+return
+
+for inlet in inlets:
+inlet_uri = None
+if isinstance(inlet, dict) and "dataset_uri" in inlet:
+inlet_uri = inlet["dataset_uri"]
+elif isinstance(inlet, str):
+inlet_uri = inlet
+else:
+self.log.error("Inlet is not a string or dict with 
'dataset_uri': %s", inlet)
+continue
+self.log.info("[InformaticaLineageListener] Inlet: %s and type: 
%s", inlet_uri, type(inlet))
+try:
+obj = self.hook.get_object(inlet_uri)
+if obj and "id" in obj and obj["id"]:
+valid_inlets.append((inlet_uri, obj["id"]))
+except Exception as e:
+self.log.exception("Failed to resolve inlet %s: %s", 
inlet_uri, e

Re: [PR] Informatica provider [airflow]

2026-02-03 Thread via GitHub


cetingokhan commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2761131456


##
providers/informatica/src/airflow/providers/informatica/plugins/listener.py:
##
@@ -0,0 +1,191 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+import logging
+from typing import TYPE_CHECKING
+
+from airflow import settings
+from airflow.listeners import hookimpl
+from airflow.providers.informatica.extractors import 
InformaticaLineageExtractor
+from airflow.providers.informatica.hooks.edc import InformaticaEDCHook
+
+if TYPE_CHECKING:
+from airflow.models import TaskInstance
+from airflow.utils.state import TaskInstanceState
+
+_informatica_listener: InformaticaListener | None = None
+
+
+def _executor_initializer():
+"""
+Initialize processes for the executor used with DAGRun listener's methods 
(on scheduler).
+
+This function must be picklable, so it cannot be defined as an inner 
method or local function.
+
+Reconfigures the ORM engine to prevent issues that arise when multiple 
processes interact with
+the Airflow database.
+"""
+# This initializer is used only on the scheduler
+# We can configure_orm regardless of the Airflow version, as DB access is 
always allowed from scheduler.
+settings.configure_orm()
+
+
+class InformaticaListener:
+"""Informatica listener sends events on task instance and dag run starts, 
completes and failures."""
+
+def __init__(self):
+self._executor = None
+self.log = logging.getLogger(__name__)
+self.hook = InformaticaLineageExtractor(edc_hook=InformaticaEDCHook())
+# self.extractor_manager = ExtractorManager()
+
+@hookimpl
+def on_task_instance_success(
+self, previous_state: TaskInstanceState, task_instance: TaskInstance, 
*args, **kwargs
+):
+self._handle_lineage(task_instance, state="success")
+
+@hookimpl
+def on_task_instance_failed(
+self, previous_state: TaskInstanceState, task_instance: TaskInstance, 
*args, **kwargs
+):
+self._handle_lineage(task_instance, state="failed")
+
+@hookimpl
+def on_task_instance_running(
+self, previous_state: TaskInstanceState, task_instance: TaskInstance, 
*args, **kwargs
+):
+self._handle_lineage(task_instance, state="running")
+
+def _handle_lineage(self, task_instance: TaskInstance, state: str):
+"""
+Handle lineage resolution for inlets and outlets.
+
+For each inlet and outlet, resolve Informatica EDC object IDs using 
getObject.
+If valid, collect and create lineage links between all valid inlets 
and outlets.
+"""
+task = getattr(task_instance, "task", None)
+if not task:
+self.log.debug("No task found for TaskInstance %s", task_instance)
+return
+inlets = getattr(task, "inlets", getattr(task_instance, "inlets", []))
+outlets = getattr(task, "outlets", getattr(task_instance, "outlets", 
[]))
+
+valid_inlets = []  # List of tuples: (uri, object_id)
+valid_outlets = []
+
+self.log.info("[InformaticaLineageListener] Task: %s State: %s", 
task_instance.task_id, state)
+
+if state != "success":
+self.log.info("[InformaticaLineageListener] Skipping lineage 
handling for state: %s", state)
+return
+
+for inlet in inlets:
+inlet_uri = None
+if isinstance(inlet, dict) and "dataset_uri" in inlet:
+inlet_uri = inlet["dataset_uri"]
+elif isinstance(inlet, str):
+inlet_uri = inlet
+else:
+self.log.error("Inlet is not a string or dict with 
'dataset_uri': %s", inlet)
+continue
+self.log.info("[InformaticaLineageListener] Inlet: %s and type: 
%s", inlet_uri, type(inlet))
+try:
+obj = self.hook.get_object(inlet_uri)
+if obj and "id" in obj and obj["id"]:
+valid_inlets.append((inlet_uri, obj["id"]))
+except Exception as e:
+self.log.exception("Failed to resolve inlet %s: %s", 
inlet_uri

Re: [PR] Informatica provider [airflow]

2026-02-03 Thread via GitHub


Copilot commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2761089771


##
providers/informatica/provider.yaml:
##
@@ -0,0 +1,69 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+---
+package-name: apache-airflow-providers-informatica
+name: Informatica Airflow
+description: |
+  `Informatica `__
+
+state: ready
+source-date-epoch: 1758787152
+# Note that those versions are maintained by release manager - do not update 
them manually
+# with the exception of case where other provider in sources has >= new 
provider version.
+# In such case adding >= NEW_VERSION and bumping to NEW_VERSION in a provider 
have
+# to be done in the same PR
+versions:
+  - 0.1.0
+
+integrations:
+  - integration-name: Informatica
+external-doc-url: https://www.informatica.com/
+logo: /docs/integration-logos/informatica.png
+tags: [protocol]
+
+operators:
+  - integration-name: Informatica
+python-modules:
+  - airflow.providers.informatica.operators.empty
+
+plugins:
+  - name: informatica
+plugin-class: 
airflow.providers.informatica.plugins.InformaticaProviderPlugin
+

Review Comment:
   `provider.yaml` doesn’t list the hook module nor register a 
`connection-types` entry for `InformaticaEDCHook`, even though the provider 
implements a hook with `conn_type = "informatica_edc"`. Other providers include 
both (e.g. `providers/atlassian/jira/provider.yaml:73-81`). Please add a 
`hooks:` section and a `connection-types:` entry so the hook/connection type is 
discoverable in docs/UI and consistent with the rest of the repo.



##
providers/informatica/pyproject.toml:
##
@@ -0,0 +1,125 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# NOTE! THIS FILE IS AUTOMATICALLY GENERATED AND WILL BE OVERWRITTEN!
+
+# IF YOU WANT TO MODIFY THIS FILE EXCEPT DEPENDENCIES, YOU SHOULD MODIFY THE 
TEMPLATE
+# `pyproject_TEMPLATE.toml.jinja2` IN the 
`dev/breeze/src/airflow_breeze/templates` DIRECTORY
+[build-system]
+requires = ["flit_core==3.12.0"]
+build-backend = "flit_core.buildapi"
+
+[project]
+name = "apache-airflow-providers-informatica"
+version = "0.1.0"
+description = "Provider package apache-airflow-providers-informatica for 
Apache Airflow"
+readme = "README.rst"
+authors = [
+{name="Apache Software Foundation", email="[email protected]"},
+]
+maintainers = [
+{name="Apache Software Foundation", email="[email protected]"},
+]
+keywords = [ "airflow-provider", "informatica", "airflow", "integration" ]
+classifiers = [
+"Development Status :: 5 - Production/Stable",
+"Environment :: Console",
+"Environment :: Web Environment",
+"Intended Audience :: Developers",
+"Intended Audience :: System Administrators",
+"Framework :: Apache Airflow",
+"Framework :: Apache Airflow :: Provider",
+"License :: OSI Approved :: Apache Software License",
+"Programming Language :: Python :: 3.10",
+"Programming Language :: Python :: 3.11",
+"Programming Language :: Python :: 3.12",
+"Programming Language :: Python :: 3.13",
+"Topic :: System :: Monitoring",
+]
+requires-python = ">=3.10"
+
+# The dependencies should be modified in place in the generated file.
+# Any change in the dependencies is preserved when the file is regenerated
+# Make sure to run ``prek update-providers-dependencies --all-files``
+# After you modify the dependencies, and rebuild your Breeze CI im

Re: [PR] Informatica provider [airflow]

2026-02-03 Thread via GitHub


jscheffl commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2761056279


##
providers/informatica/src/airflow/providers/informatica/plugins/listener.py:
##
@@ -0,0 +1,191 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+import logging
+from typing import TYPE_CHECKING
+
+from airflow import settings
+from airflow.listeners import hookimpl
+from airflow.providers.informatica.extractors import 
InformaticaLineageExtractor
+from airflow.providers.informatica.hooks.edc import InformaticaEDCHook
+
+if TYPE_CHECKING:
+from airflow.models import TaskInstance
+from airflow.utils.state import TaskInstanceState
+
+_informatica_listener: InformaticaListener | None = None
+
+
+def _executor_initializer():
+"""
+Initialize processes for the executor used with DAGRun listener's methods 
(on scheduler).
+
+This function must be picklable, so it cannot be defined as an inner 
method or local function.
+
+Reconfigures the ORM engine to prevent issues that arise when multiple 
processes interact with
+the Airflow database.
+"""
+# This initializer is used only on the scheduler
+# We can configure_orm regardless of the Airflow version, as DB access is 
always allowed from scheduler.
+settings.configure_orm()
+
+
+class InformaticaListener:
+"""Informatica listener sends events on task instance and dag run starts, 
completes and failures."""
+
+def __init__(self):
+self._executor = None
+self.log = logging.getLogger(__name__)
+self.hook = InformaticaLineageExtractor(edc_hook=InformaticaEDCHook())
+# self.extractor_manager = ExtractorManager()
+
+@hookimpl
+def on_task_instance_success(
+self, previous_state: TaskInstanceState, task_instance: TaskInstance, 
*args, **kwargs
+):
+self._handle_lineage(task_instance, state="success")
+
+@hookimpl
+def on_task_instance_failed(
+self, previous_state: TaskInstanceState, task_instance: TaskInstance, 
*args, **kwargs
+):
+self._handle_lineage(task_instance, state="failed")
+
+@hookimpl
+def on_task_instance_running(
+self, previous_state: TaskInstanceState, task_instance: TaskInstance, 
*args, **kwargs
+):
+self._handle_lineage(task_instance, state="running")
+
+def _handle_lineage(self, task_instance: TaskInstance, state: str):
+"""
+Handle lineage resolution for inlets and outlets.
+
+For each inlet and outlet, resolve Informatica EDC object IDs using 
getObject.
+If valid, collect and create lineage links between all valid inlets 
and outlets.
+"""
+task = getattr(task_instance, "task", None)
+if not task:
+self.log.debug("No task found for TaskInstance %s", task_instance)
+return
+inlets = getattr(task, "inlets", getattr(task_instance, "inlets", []))
+outlets = getattr(task, "outlets", getattr(task_instance, "outlets", 
[]))
+
+valid_inlets = []  # List of tuples: (uri, object_id)
+valid_outlets = []
+
+self.log.info("[InformaticaLineageListener] Task: %s State: %s", 
task_instance.task_id, state)
+
+if state != "success":
+self.log.info("[InformaticaLineageListener] Skipping lineage 
handling for state: %s", state)
+return
+
+for inlet in inlets:
+inlet_uri = None
+if isinstance(inlet, dict) and "dataset_uri" in inlet:
+inlet_uri = inlet["dataset_uri"]
+elif isinstance(inlet, str):
+inlet_uri = inlet
+else:
+self.log.error("Inlet is not a string or dict with 
'dataset_uri': %s", inlet)
+continue
+self.log.info("[InformaticaLineageListener] Inlet: %s and type: 
%s", inlet_uri, type(inlet))
+try:
+obj = self.hook.get_object(inlet_uri)
+if obj and "id" in obj and obj["id"]:
+valid_inlets.append((inlet_uri, obj["id"]))
+except Exception as e:
+self.log.exception("Failed to resolve inlet %s: %s", 
inlet_uri, e

Re: [PR] Informatica provider [airflow]

2026-02-03 Thread via GitHub


jscheffl commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2761052946


##
providers/informatica/src/airflow/providers/informatica/plugins/listener.py:
##
@@ -0,0 +1,191 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+import logging
+from typing import TYPE_CHECKING
+
+from airflow import settings
+from airflow.listeners import hookimpl
+from airflow.providers.informatica.extractors import 
InformaticaLineageExtractor
+from airflow.providers.informatica.hooks.edc import InformaticaEDCHook
+
+if TYPE_CHECKING:
+from airflow.models import TaskInstance
+from airflow.utils.state import TaskInstanceState
+
+_informatica_listener: InformaticaListener | None = None
+
+
+def _executor_initializer():
+"""
+Initialize processes for the executor used with DAGRun listener's methods 
(on scheduler).
+
+This function must be picklable, so it cannot be defined as an inner 
method or local function.
+
+Reconfigures the ORM engine to prevent issues that arise when multiple 
processes interact with
+the Airflow database.
+"""
+# This initializer is used only on the scheduler
+# We can configure_orm regardless of the Airflow version, as DB access is 
always allowed from scheduler.
+settings.configure_orm()
+
+
+class InformaticaListener:
+"""Informatica listener sends events on task instance and dag run starts, 
completes and failures."""
+
+def __init__(self):
+self._executor = None
+self.log = logging.getLogger(__name__)
+self.hook = InformaticaLineageExtractor(edc_hook=InformaticaEDCHook())
+# self.extractor_manager = ExtractorManager()
+
+@hookimpl
+def on_task_instance_success(
+self, previous_state: TaskInstanceState, task_instance: TaskInstance, 
*args, **kwargs
+):
+self._handle_lineage(task_instance, state="success")
+
+@hookimpl
+def on_task_instance_failed(
+self, previous_state: TaskInstanceState, task_instance: TaskInstance, 
*args, **kwargs
+):
+self._handle_lineage(task_instance, state="failed")
+
+@hookimpl
+def on_task_instance_running(
+self, previous_state: TaskInstanceState, task_instance: TaskInstance, 
*args, **kwargs
+):
+self._handle_lineage(task_instance, state="running")
+
+def _handle_lineage(self, task_instance: TaskInstance, state: str):
+"""
+Handle lineage resolution for inlets and outlets.
+
+For each inlet and outlet, resolve Informatica EDC object IDs using 
getObject.
+If valid, collect and create lineage links between all valid inlets 
and outlets.
+"""
+task = getattr(task_instance, "task", None)
+if not task:
+self.log.debug("No task found for TaskInstance %s", task_instance)
+return
+inlets = getattr(task, "inlets", getattr(task_instance, "inlets", []))
+outlets = getattr(task, "outlets", getattr(task_instance, "outlets", 
[]))
+
+valid_inlets = []  # List of tuples: (uri, object_id)
+valid_outlets = []
+
+self.log.info("[InformaticaLineageListener] Task: %s State: %s", 
task_instance.task_id, state)
+
+if state != "success":
+self.log.info("[InformaticaLineageListener] Skipping lineage 
handling for state: %s", state)
+return
+
+for inlet in inlets:
+inlet_uri = None
+if isinstance(inlet, dict) and "dataset_uri" in inlet:
+inlet_uri = inlet["dataset_uri"]
+elif isinstance(inlet, str):
+inlet_uri = inlet
+else:
+self.log.error("Inlet is not a string or dict with 
'dataset_uri': %s", inlet)
+continue
+self.log.info("[InformaticaLineageListener] Inlet: %s and type: 
%s", inlet_uri, type(inlet))
+try:
+obj = self.hook.get_object(inlet_uri)
+if obj and "id" in obj and obj["id"]:
+valid_inlets.append((inlet_uri, obj["id"]))
+except Exception as e:
+self.log.exception("Failed to resolve inlet %s: %s", 
inlet_uri, e

Re: [PR] Informatica provider [airflow]

2026-02-03 Thread via GitHub


jscheffl commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2761050777


##
providers/informatica/src/airflow/providers/informatica/plugins/listener.py:
##
@@ -0,0 +1,191 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+import logging
+from typing import TYPE_CHECKING
+
+from airflow import settings
+from airflow.listeners import hookimpl
+from airflow.providers.informatica.extractors import 
InformaticaLineageExtractor
+from airflow.providers.informatica.hooks.edc import InformaticaEDCHook
+
+if TYPE_CHECKING:
+from airflow.models import TaskInstance
+from airflow.utils.state import TaskInstanceState
+
+_informatica_listener: InformaticaListener | None = None
+
+
+def _executor_initializer():
+"""
+Initialize processes for the executor used with DAGRun listener's methods 
(on scheduler).
+
+This function must be picklable, so it cannot be defined as an inner 
method or local function.
+
+Reconfigures the ORM engine to prevent issues that arise when multiple 
processes interact with
+the Airflow database.
+"""
+# This initializer is used only on the scheduler
+# We can configure_orm regardless of the Airflow version, as DB access is 
always allowed from scheduler.
+settings.configure_orm()
+
+
+class InformaticaListener:
+"""Informatica listener sends events on task instance and dag run starts, 
completes and failures."""
+
+def __init__(self):
+self._executor = None
+self.log = logging.getLogger(__name__)
+self.hook = InformaticaLineageExtractor(edc_hook=InformaticaEDCHook())
+# self.extractor_manager = ExtractorManager()
+
+@hookimpl
+def on_task_instance_success(
+self, previous_state: TaskInstanceState, task_instance: TaskInstance, 
*args, **kwargs
+):
+self._handle_lineage(task_instance, state="success")
+
+@hookimpl
+def on_task_instance_failed(
+self, previous_state: TaskInstanceState, task_instance: TaskInstance, 
*args, **kwargs
+):
+self._handle_lineage(task_instance, state="failed")
+
+@hookimpl
+def on_task_instance_running(
+self, previous_state: TaskInstanceState, task_instance: TaskInstance, 
*args, **kwargs
+):
+self._handle_lineage(task_instance, state="running")
+
+def _handle_lineage(self, task_instance: TaskInstance, state: str):
+"""
+Handle lineage resolution for inlets and outlets.
+
+For each inlet and outlet, resolve Informatica EDC object IDs using 
getObject.
+If valid, collect and create lineage links between all valid inlets 
and outlets.
+"""
+task = getattr(task_instance, "task", None)
+if not task:
+self.log.debug("No task found for TaskInstance %s", task_instance)
+return
+inlets = getattr(task, "inlets", getattr(task_instance, "inlets", []))
+outlets = getattr(task, "outlets", getattr(task_instance, "outlets", 
[]))
+
+valid_inlets = []  # List of tuples: (uri, object_id)
+valid_outlets = []
+
+self.log.info("[InformaticaLineageListener] Task: %s State: %s", 
task_instance.task_id, state)
+
+if state != "success":
+self.log.info("[InformaticaLineageListener] Skipping lineage 
handling for state: %s", state)
+return
+
+for inlet in inlets:
+inlet_uri = None
+if isinstance(inlet, dict) and "dataset_uri" in inlet:
+inlet_uri = inlet["dataset_uri"]
+elif isinstance(inlet, str):
+inlet_uri = inlet
+else:
+self.log.error("Inlet is not a string or dict with 
'dataset_uri': %s", inlet)
+continue
+self.log.info("[InformaticaLineageListener] Inlet: %s and type: 
%s", inlet_uri, type(inlet))
+try:
+obj = self.hook.get_object(inlet_uri)
+if obj and "id" in obj and obj["id"]:
+valid_inlets.append((inlet_uri, obj["id"]))
+except Exception as e:
+self.log.exception("Failed to resolve inlet %s: %s", 
inlet_uri, e

Re: [PR] Informatica provider [airflow]

2026-02-03 Thread via GitHub


jscheffl commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2761040480


##
providers/informatica/src/airflow/providers/informatica/plugins/listener.py:
##
@@ -0,0 +1,191 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+import logging
+from typing import TYPE_CHECKING
+
+from airflow import settings
+from airflow.listeners import hookimpl
+from airflow.providers.informatica.extractors import 
InformaticaLineageExtractor
+from airflow.providers.informatica.hooks.edc import InformaticaEDCHook
+
+if TYPE_CHECKING:
+from airflow.models import TaskInstance
+from airflow.utils.state import TaskInstanceState
+
+_informatica_listener: InformaticaListener | None = None
+
+
+def _executor_initializer():
+"""
+Initialize processes for the executor used with DAGRun listener's methods 
(on scheduler).
+
+This function must be picklable, so it cannot be defined as an inner 
method or local function.
+
+Reconfigures the ORM engine to prevent issues that arise when multiple 
processes interact with
+the Airflow database.
+"""
+# This initializer is used only on the scheduler
+# We can configure_orm regardless of the Airflow version, as DB access is 
always allowed from scheduler.
+settings.configure_orm()
+

Review Comment:
   Why is ORM being initialized in provider? I assume if running in executor 
context you should assume ORM is initialized. If it would not be at point of 
reaching here then something is wrong. I'd rather remove it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-03 Thread via GitHub


jscheffl commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2761033588


##
providers/informatica/src/airflow/providers/informatica/plugins/informatica.py:
##
@@ -0,0 +1,45 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+from airflow.configuration import conf
+from airflow.plugins_manager import AirflowPlugin
+
+is_disabled = conf.getboolean("informatica", "disabled", fallback=False)
+# Conditional imports - only load expensive dependencies when plugin is enabled
+if not is_disabled:
+from airflow.lineage.hook import HookLineageReader
+from airflow.providers.informatica.plugins.listener import 
get_informatica_listener
+
+
+class InformaticaProviderPlugin(AirflowPlugin):
+"""
+Listener that emits numerous Events.
+
+Informatica Plugin provides listener that emits OL events on DAG start,
+complete and failure and TaskInstances start, complete and failure.
+"""
+
+name: str = "InformaticaProviderPlugin"
+
+listeners: list = [get_informatica_listener()] if not is_disabled else []
+hook_lineage_readers: list = [HookLineageReader] if not is_disabled else []
+
+
+def __init__(self) -> None:
+"""Initialize InformaticaProviderPlugin."""
+super().__init__()

Review Comment:
   Initi function is not needed if nothing is done
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-03 Thread via GitHub


jscheffl commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2761021126


##
airflow-core/docs/extra-packages-ref.rst:
##
@@ -419,6 +419,8 @@ pre-installed when Airflow is installed.
 
+-+-+--+--+
 | ssh | ``pip install 'apache-airflow[ssh]'``   | 
SSH hooks and operators  |  |
 
+-+-+--+--+
+| informatica | ``pip install 'apache-airflow[informatica]'``   | 
Informatica hooks and operators |

Review Comment:
   Splitter / colum divider missing
   ```suggestion
   | informatica | ``pip install 'apache-airflow[informatica]'``   
| Informatica hooks and operators  |  |
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-03 Thread via GitHub


jscheffl commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2761028878


##
providers/informatica/src/airflow/providers/informatica/operators/empty.py:
##


Review Comment:
   I see no value in adding a second EmptyOperator - I rather think this would 
confuse users if there are two different. To also say that the core/standard 
EmptyOperator also is optimized and is not routed to an executor but 
acknowledged in Scheduler directly as noop.
   
   It is not required that provider packages have operators. And just for demo 
purposes to showcase lineage I assume not needed.
   
   TLDR: Please remove this. (including RST docs)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-03 Thread via GitHub


jscheffl commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3843698146

   Cool! Thanks @RNHTTR for raising your hand. Then most probably a pitch to 
devlist is the next step, maybe some additional sponsor shows-up. And some 
minor nit then you could the the first provider here following the new 
contribution model!
   
   @kaxil / @vikramkoka Where to we register the "sponsorship" and "steward" 
roles? In CODEOWNERS like translations or where to place the names?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-03 Thread via GitHub


RNHTTR commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3842674222

   > Hello @kaxil,
   > 
   > Thank you for your guidance and for highlighting the new management 
requirements.
   > 
   > I have reviewed the AIP-95 proposal. As you mentioned, I need a Committer 
sponsor to start the discussion on the email list. Are you open to sponsoring 
this provider, or can you suggest someone who might be interested? I can assure 
you I will handle the heavy lifting related to maintaining the provider.
   > 
   > Regarding the irrelevant changes (deleted symbolic links): As you guessed, 
this was an unintended side effect of running on a Windows machine where Git 
mishandles symbolic links. I will definitely clean these up and re-base the PR 
as I progress through the management process.
   > 
   > I look forward to your recommendations regarding sponsorship.
   > 
   > Best regards, Gökhan
   
   @cetingokhan -- I'd be happy to sponsor the provider.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-02 Thread via GitHub


cetingokhan commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2753821149


##
providers/informatica/src/airflow/providers/informatica/operators/empty.py:
##


Review Comment:
   I introduced this specific operator to provide better semantic isolation for 
data lineage, acting as a dedicated anchor for inlets and outlets. 
   My goal was to separate these synchronization points from generic tasks, 
especially with future plans to integrate sqlParse for automatic lineage 
extraction from SQL. 
   However, if this approach is considered overkill for the current scope of 
the provider, I am happy to revert to the standard EmptyOperator to keep the 
implementation simpler.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-02-02 Thread via GitHub


jscheffl commented on code in PR #57610:
URL: https://github.com/apache/airflow/pull/57610#discussion_r2753608445


##
providers/informatica/src/airflow/providers/informatica/operators/empty.py:
##


Review Comment:
   Why? Boilerplate?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-01-08 Thread via GitHub


potiuk commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3723484946

   You do not need to have "sponsor" before you start discussion. You will need 
to get someone(s) committed to overlook your maintenance - but it's a bit your 
job to get someone to volunteer (and likely the discussion you start should be 
convincing, and ideally someone who signs up should be somehow knowledgeable 
about it - at least somewhat.
   
   The discussion on devlist is the best way to attract someone - there is no 
single person that you can delegate finding who that sponsor might be to - it's 
really the first task of someone to commit to maintain a provider to convince a 
volunteer to sign up for it.
   
   And a good way to convince someone is to "do stuff". For example continue 
rebasing and fixing issues with your PR and making it green - this is one of 
the signals that you are serious about further maintenance. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-01-06 Thread via GitHub


cetingokhan commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3714817239

   Hello @kaxil,
   
   Thank you for your guidance and for highlighting the new management 
requirements.
   
   I have reviewed the AIP-95 proposal. As you mentioned, I need a Committer 
sponsor to start the discussion on the email list. Are you open to sponsoring 
this provider, or can you suggest someone who might be interested? I can assure 
you I will handle the heavy lifting related to maintaining the provider.
   
   Regarding the irrelevant changes (deleted symbolic links): As you guessed, 
this was an unintended side effect of running on a Windows machine where Git 
mishandles symbolic links. I will definitely clean these up and re-base the PR 
as I progress through the management process.
   
   I look forward to your recommendations regarding sponsorship.
   
   Best regards, Gökhan


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-01-06 Thread via GitHub


kaxil commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3714717853

   Hey @cetingokhan,
   
   Following up on Jarek's earlier comment -- the [AIP-95 Provider 
Governance](https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-95+Provider+lifecycle+update+proposal)
 vote has concluded and passed 
([result](https://lists.apache.org/thread/qrv0j4dxp2yg09gds40vh49dhkbrj5q9)). 
This establishes a new process for adding providers to Airflow.
   
   **Governance requirements:**
   
   Before we can consider any new provider, contributors need to:
   
   1. Start a discussion thread on [email protected] (you can check 
archive [here](https://lists.apache.org/[email protected]) ) 
proposing the provider and explaining the use case
   2. Identify 2 stewards who will commit to long-term maintenance (each 
steward needs sponsorship from an existing Committer)
   3. Accept that the provider starts in Incubation stage with specific 
graduation criteria
   
   This applies to all new provider PRs -- we won't be merging any until the 
proposal goes through the mailing list.
   
   Also, this PR also has unrelated changes that need to be removed:
   - Deleted symlinks in `airflow-core/src/airflow/_shared/` (logging, 
secrets_masker, timezones, example_dags/standard)
   
   These look like local checkout issues that got committed accidentally.
   
   Once you've gone through the governance process on the mailing list, you can 
rebase and clean up this PR.
   
   Let us know if you have any questions or need clarifications.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-01-06 Thread via GitHub


kaxil commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3714711229

   Hey @cetingokhan,
   
   Following up on Jarek's earlier comment -- the [AIP-95 Provider 
Governance](https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-95+Provider+lifecycle+update+proposal)
 vote has concluded and passed 
([result](https://lists.apache.org/thread/qrv0j4dxp2yg09gds40vh49dhkbrj5q9)). 
This establishes a new process for adding providers to Airflow.\n\n**Governance 
requirements:**\n\nBefore we can consider any new provider, contributors need 
to:\n\n1. Start a discussion thread on [email protected] proposing the 
provider and explaining the use case\n2. Identify 2 stewards who will commit to 
long-term maintenance (each steward needs sponsorship from an existing 
Committer)\n3. Accept that the provider starts in Incubation stage with 
specific graduation criteria\n\nThis applies to all new provider PRs -- we 
won't be merging any until the proposal goes through the mailing 
list.\n\n**Technical feedback on this PR:**\n\nThis PR also has unrelated 
changes that need to be removed:\n- Deleted s
 ymlinks in `airflow-core/src/airflow/_shared/` (logging, secrets_masker, 
timezones, example_dags/standard)\n\nThese look like local checkout issues that 
got committed accidentally.\n\nOnce you've gone through the governance process 
on the mailing list, you can rebase and clean up this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2026-01-06 Thread via GitHub


kaxil commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3714702045

   Hey @cetingokhan,
   
   Just following up on Jarek's earlier comment -- the [AIP-95 Provider 
Governance](https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-95+Provider+lifecycle+update+proposal)
 vote has now concluded and passed 
([result](https://lists.apache.org/thread/qrv0j4dxp2yg09gds40vh49dhkbrj5q9)).\n\nBefore
 we can consider this PR, you'll need to:\n\n1. Start a discussion thread on 
[email protected] proposing this provider\n2. Identify 2 stewards who 
will commit to maintaining it (each needs sponsorship from an existing 
Committer)\n3. The provider would start in Incubation stage\n\nHappy to help if 
you have questions about the process.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2025-12-21 Thread via GitHub


github-actions[bot] closed pull request #57610: Informatica provider
URL: https://github.com/apache/airflow/pull/57610


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2025-12-16 Thread via GitHub


github-actions[bot] commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3663009575

   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed in 5 days if no further activity occurs. 
Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Informatica provider [airflow]

2025-10-31 Thread via GitHub


boring-cyborg[bot] commented on PR #57610:
URL: https://github.com/apache/airflow/pull/57610#issuecomment-3472358399

   Congratulations on your first Pull Request and welcome to the Apache Airflow 
community! If you have any issues or are unsure about any anything please check 
our Contributors' Guide 
(https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (ruff, mypy and type 
annotations). Our [prek-hooks]( 
https://github.com/apache/airflow/blob/main/contributing-docs/08_static_code_checks.rst#prerequisites-for-prek-hooks)
 will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in 
`docs/` directory). Adding a new operator? Check this short 
[guide](https://github.com/apache/airflow/blob/main/airflow-core/docs/howto/custom-operator.rst)
 Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze 
environment](https://github.com/apache/airflow/blob/main/dev/breeze/doc/README.rst)
 for testing locally, it's a heavy docker but it ships with a working Airflow 
and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get 
the final approval from Committers.
   - Please follow [ASF Code of 
Conduct](https://www.apache.org/foundation/policies/conduct) for all 
communication including (but not limited to) comments on Pull Requests, Mailing 
list and Slack.
   - Be sure to read the [Airflow Coding style]( 
https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#coding-style-and-best-practices).
   - Always keep your Pull Requests rebased, otherwise your build might fail 
due to changes not related to your commits.
   Apache Airflow is a community-driven project and together we are making it 
better 🚀.
   In case of doubts contact the developers at:
   Mailing List: [email protected]
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]