Hi Ali!
I see, so the tasks 192.168.200.174 and 192.168.200.175 apparently do not
make progress, even do not recognize the end-of-stream point.
I expect that the streams on 192.168.200.174 and 192.168.200.175 are
back-pressured to a stand-still. Since no network is involved, the reason
for the
Hi Stephan,
I’m using DataStream.writeAsText(String path, WriteMode writemode) for my
sink. The data is written to disk and there’s plenty of space available.
I looked deeper into the logs and found out that the jobs on 174 and 175
are not actually stuck, but they’re moving extremely slowly,
Hi Stephan,
I got a request to share the image with someone and I assume it was you.
You should be able to see it now. This seems to be the main issue I have
at this time. I've tried running the job on the cluster with a parallelism
of 16, 24, 36, and even went up to 48. I see all the parallel
Hi Ali!
Seems like the Google Doc has restricted access, I tells me I have no
permission to view it...
Stephan
On Wed, Dec 9, 2015 at 8:49 PM, Kashmar, Ali wrote:
> Hi Stephan,
>
> Here’s a link to the screenshot I tried to attach earlier:
>
>
Hi Stephan,
That was my original understanding, until I realized that I was not using
a parallel socket source. I had a custom source that extended
SourceFunction which always runs with parallelism = 1. I looked through
the API and found the ParallelSourceFunction interface so I implemented
that
Hi Stephan,
Here’s a link to the screenshot I tried to attach earlier:
https://drive.google.com/open?id=0B0_jTR8-IvUcMEdjWGFmYXJYS28
It looks to me like the distribution is fairly skewed across the nodes,
even though they’re executing the same pipeline.
Thanks,
Ali
On 2015-12-09, 12:36 PM,
Hi!
The parallel socket source looks good.
I think you forgot to attach the screenshot, or the mailing list dropped
the attachment...
Not sure if I can diagnose that without more details. The sources all do
the same. Assuming that the server distributes the data evenly across all
connected
Hi Ali!
In the case you have, the sequence of source-map-filter ... forms a
pipeline.
You mentioned that you set the parallelism to 16, so there should be 16
pipelines. These pipelines should be completely independent.
Looking at the way the scheduler is implemented, independent pipelines
If I'm not mistaken, then the scheduler has already a preference to spread
independent pipelines out across the cluster. At least he uses a queue of
instances from which it pops the first element if it allocates a new slot.
This instance is then appended to the queue again, if it has some
> On 01 Dec 2015, at 15:26, Kashmar, Ali wrote:
>
> Is there a way to make a task cluster-parallelizable? I.e. Make sure the
> parallel instances of the task are distributed across the cluster. When I
> run my flink job with a parallelism of 16, all the parallel tasks are
>
Is there a way to make a task cluster-parallelizable? I.e. Make sure the
parallel instances of the task are distributed across the cluster. When I
run my flink job with a parallelism of 16, all the parallel tasks are
assigned to the first task manager.
- Ali
On 2015-11-30, 2:18 PM, "Ufuk Celebi"
Hello,
I’m trying to wrap my head around task parallelism in a Flink cluster. Let’s
say I have a cluster of 3 nodes, each node offering 16 task slots, so in total
I’d have 48 slots for processing. Do the parallel instances of each task get
distributed across the cluster or is it possible
> On 30 Nov 2015, at 17:47, Kashmar, Ali wrote:
> Do the parallel instances of each task get distributed across the cluster or
> is it possible that they all run on the same node?
Yes, slots are requested from all nodes of the cluster. But keep in mind that
multiple tasks
13 matches
Mail list logo