Hi!

Your RefDataPriceJoiner should implement KeyedBroadcastProcessFunction
instead of KeyedCoProcessFunction. See the Java docs of DataStream#connect.
What's your Flink version by the way?

James Sandys-Lumsdaine <jas...@hotmail.com> 于2021年9月7日周二 上午12:02写道:

> Hello,
>
> I have a Flink workflow which is partitioned on a key common to all the
> stream objects and a key that is best suited to the high volume of data I
> am processing. I now want to add in a new stream of prices that I want to
> make available to all partitioned streams - however, this new stream of
> prices does not have this common keyBy value.
>
> I have tried writing a piece of code using then broadcast() method (no
> args) to get this new price stream to be sent to all the parallel instances
> on an operator. The code looks like this:
>
> KeyedStream<RefData> keyedRefDataStream = ....;
>
> DataStream<Price> prices = ....;
> DataStream<Price> broadcastPrices = prices.broadcast();
>
> keyedRefDataStream
>     .connect(broadcastPrices)
>     .process(new RefDataPriceJoiner()); // implements
> KeyedCoProcessFunction
>
> I then get an error saying the broadcastPrices stream must be keyed - but
> I can't key it on the same key as the refData stream because it lacks this
> field.
>
> I could reshuffle all my data by re-keying the ref data on a different
> field but this will cause a huge amount of data to be sent over the network
> compared with me being able to broadcast this much smaller amount of data
> to my keyed streams. Note I am assuming this isn't a "broadcast state"
> example - I assume the broadcast() method allows me to send data to all
> partitions.
>
> Is any of this possible? Any pointers for me would be very helpful as I
> can't find answer on the web or in the documentation.
>
> Many thanks,
>
> James.
>

Reply via email to