If the system has to decide data shipping strategies for a join (e.g.,
broadcasting one side) it helps to have good estimates of the input sizes.
On 04.05.2015 14:53, Flavio Pompermaier wrote:
Thanks Sebastian and Fabian for the feedback, just one last question:
what does change from the system point of view to know that the output
tuples is <= the number of input tuples?
Is there any optimization that Flink can apply to the pipeline?
On Mon, May 4, 2015 at 2:49 PM, Fabian Hueske <fhue...@gmail.com
<mailto:fhue...@gmail.com>> wrote:
It should not make a difference. I think its just personal taste.
If your filter condition is simple enough, I'd go with Flink's Table
API because it does not require to define a Filter or FlatMapFunction.
2015-05-04 14:43 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it
<mailto:pomperma...@okkam.it>>:
Hi Flinkers,
I'd like to know whether it's better to perform a filter.project
or a flatMap to filter tuples and do some projection after the
filter. Functionally they are equivalent but maybe I'm ignoring
something..
Thanks in advance,
Flavio