Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-03-29 Thread Jacky Lau
thanks Gyula Fóra. i have read it. And i think it is lark of flink catalog info, which you can see spark atlas project here https://github.com/hortonworks-spark/spark-atlas-connector -- Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-03-27 Thread Gyula Fóra
Hi Jack! You can find the document here: https://docs.google.com/document/d/1wSgzPdhcwt-SlNBBqL-Zb7g8fY6bN8JwHEg7GCdsBG8/edit?usp=sharing The document links to an already working Atlas hook prototype (and accompanying flink fork). The links for that are also here: Flink:

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-03-27 Thread jackylau
Hi Márton Balassi: I am very glad to look at it and where to find . And it is my issue , which you can see https://issues.apache.org/jira/browse/FLINK-16774 -- Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-03-27 Thread Márton Balassi
Hi Jack, Yes, we know how to do it and even have the implementation ready and being reviewed by the Atlas community at the moment. :-) Would you be interested in having a look? On Thu, Mar 19, 2020 at 12:56 PM jackylau wrote: > Hi: > i think flink integrate atlas also need add catalog

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-03-19 Thread jackylau
Hi: i think flink integrate atlas also need add catalog information such as spark atlas project .https://github.com/hortonworks-spark/spark-atlas-connector when user use catalog such as JDBCCatalog/HiveCatalog, flink atlas project will sync this information to atlas. But i don't find any

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-03-13 Thread Kostas Kloudas
I think that the ExecutorListener idea could work. With a bit more than FLIP-85, it is true that we can get rid of the "exception throwing" environments and we need to introduce an "EmbeddedExecutor" which is going to run on the JM. So, the 2 above, coupled with an ExecutorListener can have the

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-03-13 Thread Stephan Ewen
Few thoughts on the discussion: ## Changes on the Master If possible, let's avoid changes to the master (JobManager / Dispatcher). These components are complex, we should strive to keep anything out of them that we can keep out of them. ## Problems in different deployments (applications /

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-03-13 Thread tison
Hi Gyula and all, Thanks for the discussion so far. It seems that the requirement is to deliver some metadata of the submitted job, and such metadata can be simply extracted from StreamGraph. I'm unfamiliar with metadata Atlas needs so I make some assumptions. Assumption: Metadata needed by

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-03-13 Thread Gyula Fóra
Thanks again Kostas for diving deep into this, it is great feedback! I agree with the concerns regarding the custom executor, it has to be able to properly handle the "original" executor somehow. This might be quite tricky if we want to implement the AtlasExecutor outside Flink. In any case does

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-03-12 Thread Kostas Kloudas
Thanks Gyula, Looking forward to your comments. Just to let you know, I would not like having a method that in some cases works as expected and in some other ones it does not. It would be nice if we could expose consistent behaviour to the users. On Thu, Mar 12, 2020 at 8:44 PM Gyula Fóra

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-03-12 Thread Gyula Fóra
Thanks Kostas, I have to review the possible limitations with the Executor before I can properly answer. Regarding you comments for the listener pattern, we proposed in the document to include the getPipeline() in the JobClient itself as you suggested to fit the pattern :) For not always being

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-03-12 Thread Kostas Kloudas
Hi again, Just to clarify, I am not against exposing the Pipeline if this will lead to a "clean" solution. And, I. forgot to say that the last solution, if adopted, would have to work on the JobGraph, which may not be that desirable. Kostas On Thu, Mar 12, 2020 at 8:26 PM Kostas Kloudas wrote:

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-03-12 Thread Kostas Kloudas
Hi all, I do not have a strong opinion on the topic yet, but I would like to share my thoughts on this. In the solution proposing a wrapping AtlasExecutor around the Flink Executors, if we allow the user to use the CLI to submit jobs, then this means that the CLI code may have to change so that

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-03-12 Thread Stephan Ewen
Hi Gyula! My main motivation was to try and avoid mixing an internal interface (Pipeline) with public API. It looks like this is trying to go "public stable", but doesn't really do it exactly because of mixing "pipeline" into this. You would need to cast "Pipeline" and work on internal classes in

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-03-12 Thread Gyula Fóra
Hi Stephan! Thanks for checking this out. I agree that wrapping the other PipelineExecutors with a delegating AtlasExecutor would be a good alternative approach to implement this but I actually feel that it suffers even more problems than exposing the Pipeline instance in the JobListener. The

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-03-12 Thread Stephan Ewen
Hi all! In general, nice idea to support this integration with Atlas. I think we could make this a bit easier/lightweight with a small change. One of the issues that is not super nice is that this starts exposing the (currently empty) Pipeline interface in the public API. The Pipeline is an SPI

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-03-11 Thread Aljoscha Krettek
Thanks! I'm reading the document now and will get back to you. Best, Aljoscha

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-03-02 Thread Márton Balassi
Hi all, We have added the interface for registering the connectors in custom user user defined functions, like representing enrichment from an HBase table in the middle of a Flink application. We are reaching out to the Atlas community to review the implementation in the near future too, based on

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-02-20 Thread Gyula Fóra
Hi all! Thank you for the patience! We have created a small design document for the change proposal detailing the minimal required changes in Flink for the initial version of the Atlas integration. You can find the document here:

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-02-13 Thread Gyula Fóra
Thanks for the feedback Aljoscha! I have a POC ready with the Flink changes + the Atlas hook implementation. I will try to push this to a public repo tomorrow and we can discuss further based on that! Gyula On Thu, Feb 13, 2020, 15:26 Aljoscha Krettek wrote: > I think exposing the Pipeline

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-02-13 Thread Aljoscha Krettek
I think exposing the Pipeline should be ok. Using the internal StreamGraph might be problematic because this might change/break but that's a problem of the external code. Aljoscha On 11.02.20 16:26, Gyula Fóra wrote: Hi All! I have made a prototype that simply adds a getPipeline() method to

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-02-11 Thread Gyula Fóra
Hi All! I have made a prototype that simply adds a getPipeline() method to the JobClient interface. Then I could easily implement the Atlas hook using the JobListener interface. I simply check if Pipeline is instanceof StreamGraph and do the logic there. I think this is so far the cleanest

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-02-07 Thread Gyula Fóra
Maybe we could improve the Pipeline interface in the long run, but as a temporary solution the JobClient could expose a getPipeline() method. That way the implementation of the JobListener could check if its a StreamGraph or a Plan. How bad does that sound? Gyula On Fri, Feb 7, 2020 at 10:19

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-02-07 Thread Gyula Fóra
Hi Aljoscha! That's a valid concert but we should try to figure something out, many users need this before they can use Flink. I think the closest thing we have right now is the StreamGraph. In contrast with the JobGraph the StreamGraph is pretty nice from a metadata perspective :D The big

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-02-07 Thread Aljoscha Krettek
If we need it, we can probably beef up the JobListener to allow accessing some information about the whole graph or sources and sinks. My only concern right now is that we don't have a stable interface for our job graphs/pipelines right now. Best, Aljoscha On 06.02.20 23:00, Gyula Fóra

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-02-06 Thread Gyula Fóra
Hi Jeff & Till! Thanks for the feedback, this is exactly the discussion I was looking for. The JobListener looks very promising if we can expose the JobGraph somehow (correct me if I am wrong but it is not accessible at the moment). I did not know about this feature that's why I added my

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-02-06 Thread Jeff Zhang
Hi Gyula, Flink 1.10 introduced JobListener which is invoked after job submission and finished. May we can add api on JobClient to get what info you needed for altas integration.

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-02-06 Thread Till Rohrmann
Hi Gyula, technically speaking the JobGraph is sent to the Dispatcher where a JobMaster is started to execute the JobGraph. The JobGraph comes either from the JobSubmitHandler or the JarRunHandler. Except for creating the ExecutionGraph from the JobGraph there is not much happening on the

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-02-05 Thread Gyula Fóra
@Till Rohrmann You are completely right that the Atlas hook itself should not live inside Flink. All other hooks for the other projects are implemented as part of Atlas, and the Atlas community is ready to maintain it once we have a working version. The discussion is more about changes that we

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-02-05 Thread Taher Koitawala
As far as I know, Atlas entries can be created with a rest call. Can we not create an abstracted Flink operator that makes the rest call on job execution/submission? Regards, Taher Koitawala On Wed, Feb 5, 2020, 10:16 PM Flavio Pompermaier wrote: > Hi Gyula, > thanks for taking care of

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-02-05 Thread Flavio Pompermaier
Hi Gyula, thanks for taking care of integrating Flink with Atlas (and Egeria initiative in the end) that is IMHO the most important part of all the Hadoop ecosystem and that, unfortunately, was quite overlooked. I can confirm that the integration with Atlas/Egeria is absolutely of big interest.

Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-02-05 Thread Till Rohrmann
Hi Gyula, thanks for starting this discussion. Before diving in the details of how to implement this feature, I wanted to ask whether it is strictly required that the Atlas integration lives within Flink or not? Could it also work if you have tool which receives job submissions, extracts the

[Discussion] Job generation / submission hooks & Atlas integration

2020-02-05 Thread Gyula Fóra
Hi all! We have started some preliminary work on the Flink - Atlas integration at Cloudera. It seems that the integration will require some new hook interfaces at the jobgraph generation and submission phases, so I figured I will open a discussion thread with my initial ideas to get some early