Hi,

first of all, Cross is a *very* expensive operation, if you cannot ensure
that one side is very small. If one input fits into memory, it is usually
better to use a MapFunction with a broadcast set. If both sides can be
large, Cross will take a very long time.
That being said, the strategies work as follow:
- NESTEDLOOP_STREAMED_OUTER_FIRST: The first (or left) input is the the
outer side of a nested-loop. The second (or right) input is buffered
(potentially spilled to disk). For each record of the outer input, we read
and combine all values of the spilled inner with the outer record. Hence,
the order of the outer side is preserved.
- NESTEDLOOP_BLOCKED_OUTER_FIRST: first (or left) input is the the outer
side of a nested-loop. The second (or right) input is buffered (potentially
spilled to disk). The outer side is consumed in blocks of records. For each
block of outer records, the inner side is read and each record of the inner
side is combined with all outer records in the block. This strategy
destroys the sort order of the outer side.

The other strategies switch outer and inner side and are symmetric.

The benefit of the blocked strategy is that we only iterate once per block
over the inner side and not for each individual records as the streamed
strategy does. However, the blocked variant destroys the order of the outer
side.

Best, Fabian

2017-04-04 11:21 GMT+02:00 gen-too <gen-...@gmx-topmail.de>:

> Hi,
>
> I would like to knot how the Flink cross function works. I found that
> there are four strategies ( NESTEDLOOP_BLOCKED_OUTER_FIRST,
> NESTEDLOOP_BLOCKED_OUTER_SECOND, NESTEDLOOP_STREAMED_OUTER_FIRST,
> NESTEDLOOP_STREAMED_OUTER_SECOND), but I need some more detailed
> explanations please.
>
> Thanks
>

Reply via email to