Re: Task Parallelism in a Cluster

2015-12-11 Thread Stephan Ewen
Hi Ali! I see, so the tasks 192.168.200.174 and 192.168.200.175 apparently do not make progress, even do not recognize the end-of-stream point. I expect that the streams on 192.168.200.174 and 192.168.200.175 are back-pressured to a stand-still. Since no network is involved, the reason for the

Re: Task Parallelism in a Cluster

2015-12-11 Thread Kashmar, Ali
Hi Stephan, I’m using DataStream.writeAsText(String path, WriteMode writemode) for my sink. The data is written to disk and there’s plenty of space available. I looked deeper into the logs and found out that the jobs on 174 and 175 are not actually stuck, but they’re moving extremely slowly,

Re: Task Parallelism in a Cluster

2015-12-11 Thread Kashmar, Ali
Hi Stephan, I got a request to share the image with someone and I assume it was you. You should be able to see it now. This seems to be the main issue I have at this time. I've tried running the job on the cluster with a parallelism of 16, 24, 36, and even went up to 48. I see all the parallel

Re: Task Parallelism in a Cluster

2015-12-10 Thread Stephan Ewen
Hi Ali! Seems like the Google Doc has restricted access, I tells me I have no permission to view it... Stephan On Wed, Dec 9, 2015 at 8:49 PM, Kashmar, Ali wrote: > Hi Stephan, > > Here’s a link to the screenshot I tried to attach earlier: > >

Re: Task Parallelism in a Cluster

2015-12-09 Thread Kashmar, Ali
Hi Stephan, That was my original understanding, until I realized that I was not using a parallel socket source. I had a custom source that extended SourceFunction which always runs with parallelism = 1. I looked through the API and found the ParallelSourceFunction interface so I implemented that

Re: Task Parallelism in a Cluster

2015-12-09 Thread Kashmar, Ali
Hi Stephan, Here’s a link to the screenshot I tried to attach earlier: https://drive.google.com/open?id=0B0_jTR8-IvUcMEdjWGFmYXJYS28 It looks to me like the distribution is fairly skewed across the nodes, even though they’re executing the same pipeline. Thanks, Ali On 2015-12-09, 12:36 PM,

Re: Task Parallelism in a Cluster

2015-12-09 Thread Stephan Ewen
Hi! The parallel socket source looks good. I think you forgot to attach the screenshot, or the mailing list dropped the attachment... Not sure if I can diagnose that without more details. The sources all do the same. Assuming that the server distributes the data evenly across all connected

Re: Task Parallelism in a Cluster

2015-12-08 Thread Stephan Ewen
Hi Ali! In the case you have, the sequence of source-map-filter ... forms a pipeline. You mentioned that you set the parallelism to 16, so there should be 16 pipelines. These pipelines should be completely independent. Looking at the way the scheduler is implemented, independent pipelines

Re: Task Parallelism in a Cluster

2015-12-02 Thread Till Rohrmann
If I'm not mistaken, then the scheduler has already a preference to spread independent pipelines out across the cluster. At least he uses a queue of instances from which it pops the first element if it allocates a new slot. This instance is then appended to the queue again, if it has some

Re: Task Parallelism in a Cluster

2015-12-01 Thread Ufuk Celebi
> On 01 Dec 2015, at 15:26, Kashmar, Ali wrote: > > Is there a way to make a task cluster-parallelizable? I.e. Make sure the > parallel instances of the task are distributed across the cluster. When I > run my flink job with a parallelism of 16, all the parallel tasks are >

Re: Task Parallelism in a Cluster

2015-12-01 Thread Kashmar, Ali
Is there a way to make a task cluster-parallelizable? I.e. Make sure the parallel instances of the task are distributed across the cluster. When I run my flink job with a parallelism of 16, all the parallel tasks are assigned to the first task manager. - Ali On 2015-11-30, 2:18 PM, "Ufuk Celebi"

Task Parallelism in a Cluster

2015-11-30 Thread Kashmar, Ali
Hello, I’m trying to wrap my head around task parallelism in a Flink cluster. Let’s say I have a cluster of 3 nodes, each node offering 16 task slots, so in total I’d have 48 slots for processing. Do the parallel instances of each task get distributed across the cluster or is it possible

Re: Task Parallelism in a Cluster

2015-11-30 Thread Ufuk Celebi
> On 30 Nov 2015, at 17:47, Kashmar, Ali wrote: > Do the parallel instances of each task get distributed across the cluster or > is it possible that they all run on the same node? Yes, slots are requested from all nodes of the cluster. But keep in mind that multiple tasks