If the system has to decide data shipping strategies for a join (e.g., broadcasting one side) it helps to have good estimates of the input sizes.

On 04.05.2015 14:53, Flavio Pompermaier wrote:
Thanks Sebastian and Fabian for the feedback, just one last question:
what does change from the system point of view to know that the output
tuples is <= the number of input tuples?
Is there any optimization that Flink can apply to the pipeline?

On Mon, May 4, 2015 at 2:49 PM, Fabian Hueske <fhue...@gmail.com
<mailto:fhue...@gmail.com>> wrote:

    It should not make a difference. I think its just personal taste.

    If your filter condition is simple enough, I'd go with Flink's Table
    API because it does not require to define a Filter or FlatMapFunction.


    2015-05-04 14:43 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it
    <mailto:pomperma...@okkam.it>>:

        Hi Flinkers,

        I'd like to know whether it's better to perform a filter.project
        or a flatMap to filter tuples and do some projection after the
        filter. Functionally they are equivalent but maybe I'm ignoring
        something..

        Thanks in advance,
        Flavio




Reply via email to