Hi, if the workarounds that Xingcan and me mentioned are no options for your use-case, then I think this might currently be the better option. But I would expect some better support for stream joins in the near future.
Best, Stefan > Am 31.01.2018 um 07:04 schrieb Marchant, Hayden <hayden.march...@citi.com>: > > Stefan, > > So are we essentially saying that in this case, for now, I should stick to > DataSet / Batch Table API? > > Thanks, > Hayden > > -----Original Message----- > From: Stefan Richter [mailto:s.rich...@data-artisans.com] > Sent: Tuesday, January 30, 2018 4:18 PM > To: Marchant, Hayden [ICG-IT] <hm97...@imceu.eu.ssmb.com> > Cc: user@flink.apache.org; Aljoscha Krettek <aljos...@apache.org> > Subject: Re: Joining data in Streaming > > Hi, > > as far as I know, this is not easily possible. What would be required is > something like a CoFlatmap function, where one input stream is blocking until > the second stream is fully consumed to build up the state to join against. > Maybe Aljoscha (in CC) can comment on future plans to support this. > > Best, > Stefan > >> Am 30.01.2018 um 12:42 schrieb Marchant, Hayden <hayden.march...@citi.com>: >> >> We have a use case where we have 2 data sets - One reasonable large data set >> (a few million entities), and a smaller set of data. We want to do a join >> between these data sets. We will be doing this join after both data sets are >> available. In the world of batch processing, this is pretty straightforward >> - we'd load both data sets into an application and execute a join operator >> on them through a common key. Is it possible to do such a join using the >> DataStream API? I would assume that I'd use the connect operator, though I'm >> not sure exactly how I should do the join - do I need one 'smaller' set to >> be completely loaded into state before I start flowing the large set? My >> concern is that if I read both data sets from streaming sources, since I >> can't be guaranteed of the order that the data is loaded, I may lose lots of >> potential joined entities since their pairs might not have been read yet. >> >> >> Thanks, >> Hayden Marchant >> >> >