Hi Stephan, Cheers
On Fri, Sep 4, 2015 at 2:31 PM, Stephan Ewen <se...@apache.org> wrote: > We will definitely also try to get the chaining overhead down a bit. > > BTW: To reach this kind of throughput, you need sources that can produce > very fast... > > On Fri, Sep 4, 2015 at 12:20 AM, Welly Tambunan <if05...@gmail.com> wrote: > >> Hi Stephan, >> >> That's good information to know. We will hit that throughput easily. Our >> computation graph has lot of chaining like this right now. >> I think it's safe to minimize the chain right now. >> >> Thanks a lot for this Stephan. >> >> Cheers >> >> On Thu, Sep 3, 2015 at 7:20 PM, Stephan Ewen <se...@apache.org> wrote: >> >>> In a set of benchmarks a while back, we found that the chaining >>> mechanism has some overhead right now, because of its abstraction. The >>> abstraction creates iterators for each element and makes it hard for the >>> JIT to specialize on the operators in the chain. >>> >>> For purely local chains at full speed, this overhead is observable (can >>> decrease throughput from 25mio elements/core to 15-20mio elements per >>> core). If your job does not reach that throughput, or is I/O bound, source >>> bound, etc, it does not matter. >>> >>> If you care about super high performance, collapsing the code into one >>> function helps. >>> >>> On Thu, Sep 3, 2015 at 5:59 AM, Welly Tambunan <if05...@gmail.com> >>> wrote: >>> >>>> Hi Gyula, >>>> >>>> Thanks for your response. Seems i will use filter and map for now as >>>> that one is really make the intention clear, and not a big performance hit. >>>> >>>> Thanks again. >>>> >>>> Cheers >>>> >>>> On Thu, Sep 3, 2015 at 10:29 AM, Gyula Fóra <gyula.f...@gmail.com> >>>> wrote: >>>> >>>>> Hey Welly, >>>>> >>>>> If you call filter and map one after the other like you mentioned, >>>>> these operators will be chained and executed as if they were running in >>>>> the >>>>> same operator. >>>>> The only small performance overhead comes from the fact that the >>>>> output of the filter will be copied before passing it as input to the map >>>>> to keep immutability guarantees (but no serialization/deserialization will >>>>> happen). Copying might be practically free depending on your data type, >>>>> though. >>>>> >>>>> If you are using operators that don't make use of the immutability of >>>>> inputs/outputs (i.e you don't hold references to those values) than you >>>>> can >>>>> disable copying altogether by calling env.getConfig().enableObjectReuse(), >>>>> in which case they will have exactly the same performance. >>>>> >>>>> Cheers, >>>>> Gyula >>>>> >>>>> Welly Tambunan <if05...@gmail.com> ezt írta (időpont: 2015. szept. >>>>> 3., Cs, 4:33): >>>>> >>>>>> Hi All, >>>>>> >>>>>> I would like to filter some item from the event stream. I think there >>>>>> are two ways doing this. >>>>>> >>>>>> Using the regular pipeline filter(...).map(...). We can also use >>>>>> flatMap for doing both in the same operator. >>>>>> >>>>>> Any performance improvement if we are using flatMap ? As that will be >>>>>> done in one operator instance. >>>>>> >>>>>> >>>>>> Cheers >>>>>> >>>>>> >>>>>> -- >>>>>> Welly Tambunan >>>>>> Triplelands >>>>>> >>>>>> http://weltam.wordpress.com >>>>>> http://www.triplelands.com <http://www.triplelands.com/blog/> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Welly Tambunan >>>> Triplelands >>>> >>>> http://weltam.wordpress.com >>>> http://www.triplelands.com <http://www.triplelands.com/blog/> >>>> >>> >>> >> >> >> -- >> Welly Tambunan >> Triplelands >> >> http://weltam.wordpress.com >> http://www.triplelands.com <http://www.triplelands.com/blog/> >> > > -- Welly Tambunan Triplelands http://weltam.wordpress.com http://www.triplelands.com <http://www.triplelands.com/blog/>