IMHO the only change for 2) is that you possibly get better machine utilization because it will use more parallel threads. So I think it’s a valid approach.
@Ufuk, could there be problems with the number of network buffers? I think not, because the connections are multiplexed in one channel, is this correct? I’ll also talk with the others so see if we can resolve the watermark/kafka partition issues before the 1.0 release. > On 20 Feb 2016, at 02:14, Zach Cox <zcox...@gmail.com> wrote: > > What would the differences be between these scenarios? > > 1) one task manager with numberOfTaskSlots=1 and one job with parallelism=1 > > 2) one task manager with numberOfTaskSlots=10 and one job with parallelism=10 > > In both cases all of the job's tasks get executed within the one task > manager's jvm. Are there any downsides to doing #2 instead of #1? > > I ask this question because of current issues related to watermarks with > Kafka sources [1] [2] and changing parallelism with savepoints [3]. I'm > writing a Flink job that processes events from Kafka topics that have 12 > partitions. I'm wondering if I should just set the job parallelism=12 and > make numberOfTaskSlots sum to 12 across however many task managers I set up. > It seems like watermarks would work properly then, and I could effectively > change job parallelism using the number of task managers (e.g. 1 TM with > slots=12, or 2 TMs with slots=6, or 12 TMs with slots=1, etc). > > Am I missing any important details that would make this a bad idea? It seems > like a bit of abuse of numberOfTaskSlots, but also seems like a fairly simple > solution to a few current issues. > > Thanks, > Zach > > [1] > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-partition-alignment-for-event-time-tt4782.html > [2] https://issues.apache.org/jira/browse/FLINK-3375 > [3] > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Changing-parallelism-tt4967.html >