Re: Join Dataset in stream

2018-11-15 Thread Ken Krugler
Hi Eric,

This sounds like a use case for BroadcastProcessFunction 

  You’d use the Cassandra dataset as the source for the broadcast stream, which 
is distributed to every parallel instance of your custom 
BroadcastProcessFunction. The input vectors are a partitioned stream that’s the 
other input to this function (via its processElement() method). The two streams 
get connected as a BroadcastConnectedStream 
.

Note that as of Flink 1.5 it’s also easy to maintain the broadcast state 
.

— Ken

> On Nov 14, 2018, at 11:32 PM, eric hoffmann  > wrote:
> 
> 
> Hi.
> I need to compute an euclidian distance between an input Vector and a full 
> dataset stored in Cassandra and keep the n lowest value. The Cassandra 
> dataset is evolving (mutable). I could do this on a batch job, but i will 
> have to triger it each time and the input are more like a slow stream, but 
> the computing need to be fast can i do this on a stream way? is there any 
> better solution ?
> Thx

--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com 
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra



Join Dataset in stream

2018-11-14 Thread eric hoffmann
Hi.
I need to compute an euclidian distance between an input Vector and a full
dataset stored in Cassandra and keep the n lowest value. The Cassandra
dataset is evolving (mutable). I could do this on a batch job, but i will
have to triger it each time and the input are more like a slow stream, but
the computing need to be fast can i do this on a stream way? is there any
better solution ?
Thx