I think Timo answered both questions (quoting Michael: "Hey Timo, yes that is what I needed to know. Thanks").
Maybe one more comment. The motivation of the examples is not the best performance but to showcase Flink's APIs and concepts. Best, Fabian 2015-08-14 17:43 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: > Any insight about these 2 questions..? > On 12 Aug 2015 17:38, "Flavio Pompermaier" <pomperma...@okkam.it> wrote: > >> This is something I've never understood in depth: isn't a mapper created >> for each record?if it's created only once per task manager then it's not so >> different from mapPartition..what I'm missing here? >> >> And then a more philosophic question: all big data framework requires >> somehow to manage memory very efficiently (Flink has even though to reserve >> a fraction of the entire memory in order to have control over it). Wouldn't >> be simpler if java would finally release some APIs (even marked as unsafe, >> it doesn't change theMat much) to allow for a full control of the >> memory..?it will make a lot of sense for all big data platforms (at least >> for non-UDF code...). >> >> Best, >> Flavio >> On 12 Aug 2015 12:44, "Timo Walther" <twal...@apache.org> wrote: >> >>> Hello Michael, >>> >>> every time you code a Java program you should avoid object creation if >>> you want an efficient program, because every created object needs to be >>> garbage collected later (which slows down your program performance). >>> You can have small Pojos, just try to avoid the call "new" in your >>> functions: >>> >>> Instead of: >>> >>> class Mapper implements MapFunction<String,Pojo> { >>> public Pojo map(String s) { >>> Pojo p = new Pojo(); >>> p.f = s; >>> } >>> } >>> >>> do: >>> >>> class Mapper implements MapFunction<String,Pojo> { >>> private Pojo p = new Pojo(); >>> public Pojo map(String s) { >>> p.f = s; >>> } >>> } >>> >>> Then an object is only created once per Mapper and not per record. >>> >>> Hope this helps. >>> >>> Regards, >>> Timo >>> >>> >>> >>> On 12.08.2015 11:53, Michael Huelfenhaus wrote: >>> >>>> Hello >>>> >>>> I have a question about the programming of user defined functions, is >>>> it still like in old Stratosphere times the case that object creation >>>> should be avoided al all cost? Because in some of the examples there are >>>> now Tuples and other objects created before returning them. >>>> >>>> I gonna have an at least 6 step streaming plan and I am going to use >>>> Pojos. Is it performance wise a big improvement to define one big pojo that >>>> can be used by all the steps or better to have smaller ones to send less >>>> data but create more objects. >>>> >>>> Thanks >>>> Michael >>>> >>> >>>