Hi Aljoscha, thanks for the reply, it was not urgent and I was aware of the FF...btw, congratulations for it, I saw many interesting talks! Flink community has grown a lot since it was Stratosphere ;) Just one last question: in many of my use cases it could be helpful to see how many of the created splits were "consumed" by an inputFormat/source. Is it possible to monitor this part somewhere in the dashboards or with a custom metric?
On Tue, Apr 18, 2017 at 5:24 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > Hi, > sorry for not getting any responses but I think everyone was quite busy > with Flink Forward SF. I’m also no expert on the topic but I’ll try and > give some answers. > > Regarding a Google Doc version, I don’t think that there is any. You would > have to modify the Markdown version we have in the doc. > > For the other answers I’ll reuse an example program that consists of > Source -> Map -> Sink, with chaining disabled and parallelism 2. We’ll this > have three Tasks: Source, Map, and Sink, with each having two subtasks. > Let’s denote the subtasks by a number in parenthesis so the first subtask > for Source is Source(1), second one is Source(2). I’ll also refer to > Source(1) -> Map(1) -> Sink(1) as a slice of the execution graph since > these can be executed within one slot. > > Regarding 1, I think this is true. However, a single slot can execute a > complete slice of the execution graph where each subtask (from a different > task) would be executed by its own thread. > > Regarding 2.1, Yes, I think it cannot run multiple subtasks of the same > task while it is possible (and in fact done) to execute all the subtasks of > a slide in the same slot. > > Regarding 2.2, This is so to allow executing a pipeline of parallelism 8 > using a cluster that has 8 free slots. Basically, each slice fills one slot. > > Regarding 3, I don’t really have an answer. > > Regarding 4, Yes, this can get a bit out of hand if you have very long > pipelines. > > Best, > Aljoscha > > On 11. Apr 2017, at 14:37, Flavio Pompermaier <pomperma...@okkam.it> > wrote: > > Any feedback here..? > > On Wed, Apr 5, 2017 at 7:43 PM, Flavio Pompermaier <pomperma...@okkam.it> > wrote: > >> Hi to all, >> I had a very long but useful chat with Fabian and I understood a lot of >> concepts that was not clear at all to me. We started from the Flink runtime >> documentation page (https://ci.apache.org/project >> s/flink/flink-docs-release-1.2/concepts/runtime.html) but >> I discovered that the terminology is very inconsistent and misleading >> along the page... >> >> For example, one of the very first sentences is : >> "Flink chains operator subtasks together into tasks. Each task is >> executed by one thread." >> What I first understood was that every operator can be executed only by a >> single thread in all the cluster....probably it should be better "one >> thread per task slot" (at least). >> Moreover, if I'm not wrong, a Task Slot can execute only 1 subtask (aka >> parallel instance) of each task and there's no limit to the number of >> subtasks per slot (and this is not highlighted at all in that document). >> The only constraint is that they should belong to different tasks (right?). >> >> If there's a google doc version of that page I could try to rewrite it >> down in order to make it easier to understand some parts...however I still >> have some more questions: >> >> 1. Is it correct that a single Task Slot can execute only a single >> subtask of each task and that this task is executed by a single thread >> within the slot)? >> 2. If it so: >> 1. why at that page there's written "By default, Flink allows >> subtasks to share slots even if they are subtasks of different tasks, >> so >> long as they are from the same job"? It seems that it is more common >> to run >> multiple subtasks of the same task (in a slot) than executing different >> substasks of different tasks, although this is still permitted...from >> what >> I understood a slot cannot run multiple subtask of the same task at >> all! >> 2. and why this constraint? Is there any good reason for that? A >> subtask is mapped to 1 thread in the TaskManager, so why a TM with 2 >> slots >> can run 2 subtasks of the same task (in the same JVM) while a TM with 1 >> slot cannot (while it can execute an arbitrary number of subtasks of >> different tasks)? >> 3. It it is not so, there's no images representing such a situation >> in that page... >> 4. Isn't dangerous to allow (potentially) an unlimited number of >> threads per TM slot?? >> >> Cheers, >> Flavio >> >> >