thanks Gyula Fóra. i have read it. And i think it is lark of flink catalog
info, which you can see spark atlas project here
https://github.com/hortonworks-spark/spark-atlas-connector
--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
Hi Jack!
You can find the document here:
https://docs.google.com/document/d/1wSgzPdhcwt-SlNBBqL-Zb7g8fY6bN8JwHEg7GCdsBG8/edit?usp=sharing
The document links to an already working Atlas hook prototype (and
accompanying flink fork). The links for that are also here:
Flink:
Hi Márton Balassi:
I am very glad to look at it and where to find .
And it is my issue , which you can see
https://issues.apache.org/jira/browse/FLINK-16774
--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
Hi Jack,
Yes, we know how to do it and even have the implementation ready and being
reviewed by the Atlas community at the moment. :-)
Would you be interested in having a look?
On Thu, Mar 19, 2020 at 12:56 PM jackylau wrote:
> Hi:
> i think flink integrate atlas also need add catalog
Hi:
i think flink integrate atlas also need add catalog information such as
spark atlas project
.https://github.com/hortonworks-spark/spark-atlas-connector
when user use catalog such as JDBCCatalog/HiveCatalog, flink atlas project
will sync this information to atlas.
But i don't find any
I think that the ExecutorListener idea could work.
With a bit more than FLIP-85, it is true that we can get rid of the
"exception throwing" environments and we need to introduce an
"EmbeddedExecutor" which is going to run on the JM. So, the 2 above,
coupled with an ExecutorListener can have the
Few thoughts on the discussion:
## Changes on the Master
If possible, let's avoid changes to the master (JobManager / Dispatcher).
These components are complex, we should strive to keep anything out of them
that we can keep out of them.
## Problems in different deployments (applications /
Hi Gyula and all,
Thanks for the discussion so far.
It seems that the requirement is to deliver some metadata of the submitted
job,
and such metadata can be simply extracted from StreamGraph.
I'm unfamiliar with metadata Atlas needs so I make some assumptions.
Assumption:
Metadata needed by
Thanks again Kostas for diving deep into this, it is great feedback!
I agree with the concerns regarding the custom executor, it has to be able
to properly handle the "original" executor somehow.
This might be quite tricky if we want to implement the AtlasExecutor
outside Flink. In any case does
Thanks Gyula,
Looking forward to your comments.
Just to let you know, I would not like having a method that in some
cases works as expected and in some other ones it does not. It would
be nice if we could expose consistent behaviour to the users.
On Thu, Mar 12, 2020 at 8:44 PM Gyula Fóra
Thanks Kostas, I have to review the possible limitations with the Executor
before I can properly answer.
Regarding you comments for the listener pattern, we proposed in the
document to include the getPipeline() in the JobClient itself as you
suggested to fit the pattern :) For not always being
Hi again,
Just to clarify, I am not against exposing the Pipeline if this will
lead to a "clean" solution.
And, I. forgot to say that the last solution, if adopted, would have
to work on the JobGraph, which may not be that desirable.
Kostas
On Thu, Mar 12, 2020 at 8:26 PM Kostas Kloudas wrote:
Hi all,
I do not have a strong opinion on the topic yet, but I would like to
share my thoughts on this.
In the solution proposing a wrapping AtlasExecutor around the Flink
Executors, if we allow the user to use the CLI to submit jobs, then
this means that the CLI code may have to change so that
Hi Gyula!
My main motivation was to try and avoid mixing an internal interface
(Pipeline) with public API. It looks like this is trying to go "public
stable", but doesn't really do it exactly because of mixing "pipeline" into
this.
You would need to cast "Pipeline" and work on internal classes in
Hi Stephan!
Thanks for checking this out. I agree that wrapping the other
PipelineExecutors with a delegating AtlasExecutor would be a good
alternative approach to implement this but I actually feel that it suffers
even more problems than exposing the Pipeline instance in the JobListener.
The
Hi all!
In general, nice idea to support this integration with Atlas.
I think we could make this a bit easier/lightweight with a small change.
One of the issues that is not super nice is that this starts exposing the
(currently empty) Pipeline interface in the public API.
The Pipeline is an SPI
Thanks! I'm reading the document now and will get back to you.
Best,
Aljoscha
Hi all,
We have added the interface for registering the connectors in custom user
user defined functions, like representing enrichment from an HBase table in
the middle of a Flink application. We are reaching out to the Atlas
community to review the implementation in the near future too, based on
Hi all!
Thank you for the patience!
We have created a small design document for the change proposal detailing
the minimal required changes in Flink for the initial version of the Atlas
integration.
You can find the document here:
Thanks for the feedback Aljoscha!
I have a POC ready with the Flink changes + the Atlas hook implementation.
I will try to push this to a public repo tomorrow and we can discuss
further based on that!
Gyula
On Thu, Feb 13, 2020, 15:26 Aljoscha Krettek wrote:
> I think exposing the Pipeline
I think exposing the Pipeline should be ok. Using the internal
StreamGraph might be problematic because this might change/break but
that's a problem of the external code.
Aljoscha
On 11.02.20 16:26, Gyula Fóra wrote:
Hi All!
I have made a prototype that simply adds a getPipeline() method to
Hi All!
I have made a prototype that simply adds a getPipeline() method to the
JobClient interface. Then I could easily implement the Atlas hook using the
JobListener interface. I simply check if Pipeline is instanceof StreamGraph
and do the logic there.
I think this is so far the cleanest
Maybe we could improve the Pipeline interface in the long run, but as a
temporary solution the JobClient could expose a getPipeline() method.
That way the implementation of the JobListener could check if its a
StreamGraph or a Plan.
How bad does that sound?
Gyula
On Fri, Feb 7, 2020 at 10:19
Hi Aljoscha!
That's a valid concert but we should try to figure something out, many
users need this before they can use Flink.
I think the closest thing we have right now is the StreamGraph. In contrast
with the JobGraph the StreamGraph is pretty nice from a metadata
perspective :D
The big
If we need it, we can probably beef up the JobListener to allow
accessing some information about the whole graph or sources and sinks.
My only concern right now is that we don't have a stable interface for
our job graphs/pipelines right now.
Best,
Aljoscha
On 06.02.20 23:00, Gyula Fóra
Hi Jeff & Till!
Thanks for the feedback, this is exactly the discussion I was looking for.
The JobListener looks very promising if we can expose the JobGraph somehow
(correct me if I am wrong but it is not accessible at the moment).
I did not know about this feature that's why I added my
Hi Gyula,
Flink 1.10 introduced JobListener which is invoked after job submission and
finished. May we can add api on JobClient to get what info you needed for
altas integration.
Hi Gyula,
technically speaking the JobGraph is sent to the Dispatcher where a
JobMaster is started to execute the JobGraph. The JobGraph comes either
from the JobSubmitHandler or the JarRunHandler. Except for creating the
ExecutionGraph from the JobGraph there is not much happening on the
@Till Rohrmann
You are completely right that the Atlas hook itself should not live inside
Flink. All other hooks for the other projects are implemented as part of
Atlas,
and the Atlas community is ready to maintain it once we have a working
version. The discussion is more about changes that we
As far as I know, Atlas entries can be created with a rest call. Can we not
create an abstracted Flink operator that makes the rest call on job
execution/submission?
Regards,
Taher Koitawala
On Wed, Feb 5, 2020, 10:16 PM Flavio Pompermaier
wrote:
> Hi Gyula,
> thanks for taking care of
Hi Gyula,
thanks for taking care of integrating Flink with Atlas (and Egeria
initiative in the end) that is IMHO the most important part of all the
Hadoop ecosystem and that, unfortunately, was quite overlooked. I can
confirm that the integration with Atlas/Egeria is absolutely of big
interest.
Hi Gyula,
thanks for starting this discussion. Before diving in the details of how to
implement this feature, I wanted to ask whether it is strictly required
that the Atlas integration lives within Flink or not? Could it also work if
you have tool which receives job submissions, extracts the
Hi all!
We have started some preliminary work on the Flink - Atlas integration at
Cloudera. It seems that the integration will require some new hook
interfaces at the jobgraph generation and submission phases, so I figured I
will open a discussion thread with my initial ideas to get some early
33 matches
Mail list logo