Hi Xintong,

Do you have any jiras that cover any of the items on 1 or 2? I can reach out to 
folks internally and see if I can get some folks to commit to helping out.

To cover the other qs:

  *   Yes, we’ve not got a plan at the moment to get off Mesos. We use Yarn for 
some our Flink workloads when we can. Mesos is only used when we need streaming 
capabilities in our WW dcs (as our Yarn is centralized in one DC)
  *   We’re currently on Flink 1.9 (old planner). We have a plan to bump to 
1.11 / 1.12 this quarter.
  *   We typically upgrade once every 6 months to a year (not every release). 
We’d like to speed up the cadence but we’re not there yet.
  *   We’d largely be good with keeping Flink on Mesos as-is and functional 
while missing out on some of the newer features. We understand the pain on the 
communities side and we can take on the work if we see some fancy improvement 
in Flink on Yarn / K8s that we want in Mesos to put in the request to port it 
over.

Thanks,

-- Piyush


From: Xintong Song <tonysong...@gmail.com>
Date: Sunday, October 25, 2020 at 10:57 PM
To: dev <d...@flink.apache.org>, user <user@flink.apache.org>
Cc: Lasse Nedergaard <lassenedergaardfl...@gmail.com>, <p.nar...@criteo.com>
Subject: Re: [SURVEY] Remove Mesos support

Thanks for sharing the information with us, Piyush an Lasse.



@Piyush



Thanks for offering the help. IMO, there are currently several problems that 
make supporting Flink on Mesos challenging for us.

  1.  Lack of Mesos experts. AFAIK, there are very few people (if not none) 
among the active contributors in this community that are familiar with Mesos 
and can help with development on this component.
  2.  Absence of tests. Mesos does not provide a testing cluster, like 
`MiniYARNCluster`, making it hard to test interactions between Flink and Mesos. 
We have only a few very simple e2e tests running on Mesos deployed in a docker, 
covering the most fundamental workflows. We are not sure how well those tests 
work, especially against some potential corner cases.
  3.  Divergence from other deployment. Because of 1 and 2, the new efforts 
(features, maintenance, refactors) tend to exclude Mesos if possible. When the 
new efforts have to touch the Mesos related components (e.g., changes to the 
common resource manager interfaces), we have to be very careful and make as few 
changes as possible, to avoid accidentally breaking anything that we are not 
familiar with. As a result, the component diverges a lot from other deployment 
components (K8s/Yarn), which makes it harder to maintain.

It would be greatly appreciated if you can help with either of the above issues.



Additionally, I have a few questions concerning your use cases at Criteo. IIUC, 
you are going to stay on Mesos in the foreseeable future, while keeping the 
Flink version up-to-date? What Flink version are you currently using? How often 
do you upgrade (e.g., every release)? Would you be good with keeping the Flink 
on Mesos component as it is (means that deployment and resource management 
improvements may not be ported to Mesos), while keeping other components 
up-to-date (e.g., improvements from programming APIs, operators, state backens, 
etc.)?



Thank you~

Xintong Song


On Sat, Oct 24, 2020 at 2:48 AM Lasse Nedergaard 
<lassenedergaardfl...@gmail.com<mailto:lassenedergaardfl...@gmail.com>> wrote:
Hi

At Trackunit We have been using Mesos for long time but have now moved to k8s.
Med venlig hilsen / Best regards
Lasse Nedergaard



Den 23. okt. 2020 kl. 17.01 skrev Robert Metzger 
<rmetz...@apache.org<mailto:rmetz...@apache.org>>:

Hey Piyush,
thanks a lot for raising this concern. I believe we should keep Mesos in Flink 
then in the foreseeable future.
Your offer to help is much appreciated. We'll let you know once there is 
something.

On Fri, Oct 23, 2020 at 4:28 PM Piyush Narang 
<p.nar...@criteo.com<mailto:p.nar...@criteo.com>> wrote:
Thanks Kostas. If there's items we can help with, I'm sure we'd be able to find 
folks who would be excited to contribute / help in any way.

-- Piyush


On 10/23/20, 10:25 AM, "Kostas Kloudas" 
<kklou...@gmail.com<mailto:kklou...@gmail.com>> wrote:

    Thanks Piyush for the message.
    After this, I revoke my +1. I agree with the previous opinions that we
    cannot drop code that is actively used by users, especially if it
    something that deep in the stack as support for cluster management
    framework.

    Cheers,
    Kostas

    On Fri, Oct 23, 2020 at 4:15 PM Piyush Narang 
<p.nar...@criteo.com<mailto:p.nar...@criteo.com>> wrote:
    >
    > Hi folks,
    >
    >
    >
    > We at Criteo are active users of the Flink on Mesos resource management 
component. We are pretty heavy users of Mesos for scheduling workloads on our 
edge datacenters and we do want to continue to be able to run some of our Flink 
topologies (to compute machine learning short term features) on those DCs. If 
possible our vote would be not to drop Mesos support as that will tie us to an 
old release / have to maintain a fork as we’re not planning to migrate off 
Mesos anytime soon. Is the burden something that can be helped with by the 
community? (Or are you referring to having to ensure PRs handle the Mesos piece 
as well when they touch the resource managers?)
    >
    >
    >
    > Thanks,
    >
    >
    >
    > -- Piyush
    >
    >
    >
    >
    >
    > From: Till Rohrmann <trohrm...@apache.org<mailto:trohrm...@apache.org>>
    > Date: Friday, October 23, 2020 at 8:19 AM
    > To: Xintong Song <tonysong...@gmail.com<mailto:tonysong...@gmail.com>>
    > Cc: dev <d...@flink.apache.org<mailto:d...@flink.apache.org>>, user 
<user@flink.apache.org<mailto:user@flink.apache.org>>
    > Subject: Re: [SURVEY] Remove Mesos support
    >
    >
    >
    > Thanks for starting this survey Robert! I second Konstantin and Xintong 
in the sense that our Mesos user's opinions should matter most here. If our 
community is no longer using the Mesos integration, then I would be +1 for 
removing it in order to decrease the maintenance burden.
    >
    >
    >
    > Cheers,
    >
    > Till
    >
    >
    >
    > On Fri, Oct 23, 2020 at 2:03 PM Xintong Song 
<tonysong...@gmail.com<mailto:tonysong...@gmail.com>> wrote:
    >
    > +1 for adding a warning in 1.12 about planning to remove Mesos support.
    >
    >
    >
    > With my developer hat on, removing the Mesos support would definitely 
reduce the maintaining overhead for the deployment and resource management 
related components. On the other hand, the Flink on Mesos users' voices 
definitely matter a lot for this community. Either way, it would be good to 
draw users attention to this discussion early.
    >
    >
    >
    > Thank you~
    >
    > Xintong Song
    >
    >
    >
    >
    >
    > On Fri, Oct 23, 2020 at 7:53 PM Konstantin Knauf 
<kna...@apache.org<mailto:kna...@apache.org>> wrote:
    >
    > Hi Robert,
    >
    > +1 to the plan you outlined. If we were to drop support in Flink 1.13+, we
    > would still support it in Flink 1.12- with bug fixes for some time so that
    > users have time to move on.
    >
    > It would certainly be very interesting to hear from current Flink on Mesos
    > users, on how they see the evolution of this part of the ecosystem.
    >
    > Best,
    >
    > Konstantin

Reply via email to