Hi Eric,

This sounds like a use case for BroadcastProcessFunction 
<https://ci.apache.org/projects/flink/flink-docs-stable/api/java/org/apache/flink/streaming/api/functions/co/BroadcastProcessFunction.html>
  You’d use the Cassandra dataset as the source for the broadcast stream, which 
is distributed to every parallel instance of your custom 
BroadcastProcessFunction. The input vectors are a partitioned stream that’s the 
other input to this function (via its processElement() method). The two streams 
get connected as a BroadcastConnectedStream 
<https://ci.apache.org/projects/flink/flink-docs-stable/api/java/org/apache/flink/streaming/api/datastream/BroadcastConnectedStream.html>.

Note that as of Flink 1.5 it’s also easy to maintain the broadcast state 
<https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html>.

— Ken

> On Nov 14, 2018, at 11:32 PM, eric hoffmann <sigfrid.hoffm...@gmail.com 
> <mailto:sigfrid.hoffm...@gmail.com>> wrote:
> 
> 
> Hi.
> I need to compute an euclidian distance between an input Vector and a full 
> dataset stored in Cassandra and keep the n lowest value. The Cassandra 
> dataset is evolving (mutable). I could do this on a batch job, but i will 
> have to triger it each time and the input are more like a slow stream, but 
> the computing need to be fast can i do this on a stream way? is there any 
> better solution ?
> Thx

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com <http://www.scaleunlimited.com/>
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra

Reply via email to