Re: Large Numbers of Dynamically Created Jobs

David Brelloch Tue, 22 Mar 2016 14:26:27 -0700

Konstantin,

Not a problem. Thanks for pointing me in the right direction.


David

On Tue, Mar 22, 2016 at 5:17 PM, Konstantin Knauf <
konstantin.kn...@tngtech.com> wrote:

> Hi David,
>
> interesting use case, I think, this can be nicely done with a comap. Let
> me know if you run into problems, unfortunately I am not aware of any
> open source examples.
>
> Cheers,
>
> Konstnatin
>
> On 22.03.2016 21:07, David Brelloch wrote:
> > Konstantin,
> >
> > For now the jobs will largely just involve incrementing or decrementing
> > based on the json message coming in. We will probably look at adding
> > windowing later but for now that isn't a huge priority.
> >
> > As an example of what we are looking to do lets say the following
> > 3 message were read from the kafka topic:
> > {"customerid": 1, "event": "addAdmin"}
> > {"customerid": 1, "event": "addAdmin"}
> > {"customerid": 1, "event": "deleteAdmin"}
> >
> > If the customer with id of 1 had said they care about that type of
> > message we would expect to be tracking the number of admins and notify
> > them that they currently have 2. The events are obviously much more
> > complicated than that and they are not uniform but that is the general
> > overview.
> >
> > I will take a look at using the comap operator. Do you know of any
> > examples where it is doing something similar? Quickly looking I am not
> > seeing it used anywhere outside of tests where it is largely just
> > unifying the data coming in.
> >
> > I think accumulators will at least be a reasonable starting place for us
> > so thank your for pointing me in that direction.
> >
> > Thanks for your help!
> >
> > David
> >
> > On Tue, Mar 22, 2016 at 3:27 PM, Konstantin Knauf
> > <konstantin.kn...@tngtech.com <mailto:konstantin.kn...@tngtech.com>>
> wrote:
> >
> >     Hi David,
> >
> >     I have no idea how many parallel jobs are possible in Flink, but
> >     generally speaking I do not think this approach will scale, because
> you
> >     will always only have one job manager for coordination. But there is
> >     definitely someone on the list, who can tell you more about this.
> >
> >     Regarding your 2nd question. Could you go into some more details,
> what
> >     the jobs will do? Without knowing any details, I think a control
> kafka
> >     topic which contains the "job creation/cancellation requests" of the
> >     users in combination with a comap-operator is the better solution
> here.
> >     You could keep the currently active "jobs" as state in the comap and
> and
> >     emit one record of the original stream per active user-job together
> with
> >     some indicator on how to process it based on the request. What are
> your
> >     concerns with respect to insight in the process? I think with some
> nice
> >     accumulators you could get a good idea of what is going on, on the
> other
> >     hand if I think about monitoring 1000s of jobs I am actually not so
> >     sure ;)
> >
> >     Cheers,
> >
> >     Konstantin
> >
> >     On 22.03.2016 19 <tel:22.03.2016%2019>:16, David Brelloch wrote:
> >     > Hi all,
> >     >
> >     > We are currently evaluating flink for processing kafka messages
> and are
> >     > running into some issues. The basic problem we are trying to solve
> is
> >     > allowing our end users to dynamically create jobs to alert based
> off the
> >     > messages coming from kafka. At launch we figure we need to support
> at
> >     > least 15,000 jobs (3000 customers with 5 jobs each). I have the
> example
> >     > kafka job running and it is working great. The questions I have
> are:
> >     >
> >     >  1. On my local machine (admittedly much less powerful than we
> >     would be
> >     >     using in production) things fall apart once I get to around 75
> jobs.
> >     >     Can flink handle a situation like this where we are looking at
> >     >     thousands of jobs?
> >     >  2. Is this approach even the right way to go? Is there a different
> >     >     approach that would make more sense? Everything will be
> listening to
> >     >     the same kafka topic so the other thought we had was to have 1
> job
> >     >     that processed everything and was configured by a separate
> control
> >     >     kafka topic. The concern we had there was we would almost
> completely
> >     >     lose insight into what was going on if there was a slow down.
> >     >  3. The current approach we are using for creating dynamic jobs is
> >     >     building a common jar and then starting it with the
> configuration
> >     >     data for the individual job. Does this sound reasonable?
> >     >
> >     >
> >     > If any of these questions are answered elsewhere I apologize. I
> >     couldn't
> >     > find any of this being discussed elsewhere.
> >     >
> >     > Thanks for your help.
> >     >
> >     > David
> >
> >     --
> >     Konstantin Knauf * konstantin.kn...@tngtech.com
> >     <mailto:konstantin.kn...@tngtech.com> * +49-174-3413182
> >     <tel:%2B49-174-3413182>
> >     TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
> >     Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
> >     Sitz: Unterföhring * Amtsgericht München * HRB 135082
> >
> >
>
> --
> Konstantin Knauf * konstantin.kn...@tngtech.com * +49-174-3413182
> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>

Re: Large Numbers of Dynamically Created Jobs

Reply via email to