Re: Flink Standalone cluster - production settings

2019-02-28 Thread Padarn Wilson
Are you able to give some detail on in which cases you might be better off
setting higher (or lower) parallelism for an operator?

On Thu, Feb 21, 2019 at 9:54 PM Hung  wrote:

> / Each job has 3 asynch operators
> with Executors with thread counts of 20,20,100/
>
> Flink handles parallelisms for you. If you want a higher parallelism of a
> operator, you can call setParallelism()
> for example,
>
> flatMap(new Mapper1()).setParallelism(20)
> flatMap(new Mapper2()).setParallelism(20)
> flatMap(new Mapper3()).setParallelism(100)
>
> You can check the official document here
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/parallel.html#setting-the-parallelism
>
> /Currently we are using parallelism = 1/
> I guess you set the job level parallelism
>
> I would suggest you replace Executors with the use of Flink parallelisms.
> It
> would be more efficient so
> you don't create the other thread pool although you already have one that
> flink provides you(I maybe not right describing this concept)
>
> Cheers,
>
> Sendoh
>
>
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>


Re: Flink Standalone cluster - production settings

2019-02-21 Thread Hung
/ Each job has 3 asynch operators 
with Executors with thread counts of 20,20,100/

Flink handles parallelisms for you. If you want a higher parallelism of a
operator, you can call setParallelism()
for example,

flatMap(new Mapper1()).setParallelism(20)
flatMap(new Mapper2()).setParallelism(20)
flatMap(new Mapper3()).setParallelism(100)

You can check the official document here
https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/parallel.html#setting-the-parallelism

/Currently we are using parallelism = 1/
I guess you set the job level parallelism

I would suggest you replace Executors with the use of Flink parallelisms. It
would be more efficient so 
you don't create the other thread pool although you already have one that
flink provides you(I maybe not right describing this concept)

Cheers,

Sendoh





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Flink Standalone cluster - production settings

2019-02-10 Thread simpleusr
I know this seems a silly question but I am trying to figure out optimal set
up for our flink jobs. 
We are using standalone cluster with 5 jobs. Each job has 3 asynch operators
with Executors with thread counts of 20,20,100. Source is kafka and
cassandra and rest sinks exist.
Currently we are using parallelism = 1.  So, at max load a single job spans
at least 140 threads. Also we are using netty based libraries for cassandra
and restcalls . (As I can see in thread dump flink also uses netty server).

What we see is that total thread count adds up to ~ 500 for a single job.

The issue we faced is, all of a sudden all jobs began to fail in production
and we saw that it was mainly due to ulimit user process. All jobs did
started in one server in cluster ( I do not know why, as it is a cluster
with 3 members).

It was set to around 1500 in that server. We then set a higher value and
problems seem to go away.

Can you recommend an optional prod setting for standalone cluster? Or should
there be a max limit on threads spawned by a single job?

Regards



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Flink Standalone cluster - production settings

2019-02-08 Thread simpleusr
I know this seems a silly question but I am trying to figure out optimal set
up for our flink jobs. 
We are using standalone cluster with 5 jobs. Each job has 3 asynch operators
with Executors with thread counts of 20,20,100. Source is kafka and
cassandra and rest sinks exist.
Currently we are using parallelism = 1.  So at max load a single job spans
at least 140 threads. Also we are using netty based libraries for cassandra
and restcalls . (As I can see in thread dump flink also uses netty server).
 What we see is that total thread count adds up to ~ 500 for a single job.

Suddenly all jobs began to faıl ın production and we saw that it was mainly
due to ulimit user process. All jobs started in one server in cluster ( I do
not know why, as it is a cluster with 3 members)
It was set to around 1500 in that server. We then set a higher value and
problems seem to go away.

Can you recommend an optional prod setting for standalone cluster? Or should
there be a max limit on threads spawned by a single job?

Regards




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/