subject:"Join Dataset in stream"

Re: Join Dataset in stream

2018-11-15 Thread Ken Krugler

Hi Eric,

This sounds like a use case for BroadcastProcessFunction 

  You’d use the Cassandra dataset as the source for the broadcast stream, which 
is distributed to every parallel instance of your custom 
BroadcastProcessFunction. The input vectors are a partitioned stream that’s the 
other input to this function (via its processElement() method). The two streams 
get connected as a BroadcastConnectedStream 
.

Note that as of Flink 1.5 it’s also easy to maintain the broadcast state 
.

— Ken

> On Nov 14, 2018, at 11:32 PM, eric hoffmann  > wrote:
> 
> 
> Hi.
> I need to compute an euclidian distance between an input Vector and a full 
> dataset stored in Cassandra and keep the n lowest value. The Cassandra 
> dataset is evolving (mutable). I could do this on a batch job, but i will 
> have to triger it each time and the input are more like a slow stream, but 
> the computing need to be fast can i do this on a stream way? is there any 
> better solution ?
> Thx

--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com 
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra

Join Dataset in stream

2018-11-14 Thread eric hoffmann

Hi.
I need to compute an euclidian distance between an input Vector and a full
dataset stored in Cassandra and keep the n lowest value. The Cassandra
dataset is evolving (mutable). I could do this on a batch job, but i will
have to triger it each time and the input are more like a slow stream, but
the computing need to be fast can i do this on a stream way? is there any
better solution ?
Thx

Re: Join Dataset in stream

Join Dataset in stream

2 matches

Site Navigation

Mail list logo

Footer information