Dear Philippe,

that is exactly what i need. Thank you for the concise explanation.

This approach is excellent, as it also permits the values to be easily 
updated externally.

Kind regards
Leon

30. May 2016 14:31 by philippe.capar...@orange.fr:


>
>
> Just transform the list in a DataStream. A datastream can be finite.
>
>
> One solution, in the context of a Streaming environment is to use Kafka, or 
> any other distributed broker, although Flink ships with a KafkaSource.
>
>  
>
> 1)Create a Kafka Topic dedicated to your list of key/values. Inject your 
> values into this topic, partitionned by the keys. So that you recover the 
> keys in Flink.
>
>  
>
> 2) Create a source for the stream of tuple your analysing -> output1 
> (Tuples).
>
>  
>
> 3) Create a KafkaSource, and parse/recover your key value pairs from this 
> source (e.g a first map operator) : map1 -> output 2 (K,V), then :
>
>  
>
>  
>
>  
>
>                  a)  If you need all key/Value pairs at each operator :  
> broadcast all partitions from the output 1 to the analysis operator
>
>  
>
>                   b) if you dont need all key/values pairs, just chain 
> output1 to the analysis operator. Partitioning of K,V pairs will depend on 
> Kafka partitioning strategy, and can be controlled in Flink      anyway.
>
>  
>
> 4) The analysis operator :  will perform a RichCoFlatMapFunction, and can 
> be Checkpointed.
>
> When receiving K,V pairs from output2, store them in a local state.
>
> When receiving tuple, should be able to to filter with the help of the 
> local state, and propagate downstream or not.
>
>  
>
>  
>
>  
>
>  
>
>  
>
>  
>
>  
>
>  
>
>  
>
>  
>
>  
>
>> > Message du 30/05/16 13:41
>> > De : >> leon_mcl...@tutanota.com
>> > A : "User" <>> user@flink.apache.org>> >
>> > Copie à :
>> > Objet : Elegantly sharing state in a streaming environment
>> >
>> >Hello Flink team,
>>
>> How can i partition and share static state among instances of a streaming 
>> operator?
>>
>> I have a huge list of keys and values, which are used to filter tuples in 
>> a stream. The list does not change. Currently i am sharing the list with 
>> each operator instance via the constructor, although only a subset of the 
>> list is required per operator (the assignment of subset to operator 
>> instance is known). I cannot use DataSet based functions in a streaming 
>> execution environment to assign sub lists. I also cannot use DataStream 
>> based partitioning functions as the list is static, i.e. not a DataStream. 
>> The dilemma exists as i am mixing static (DataSet type) content with 
>> streaming content. Is there any other approach aside from using an 
>> additional tool (e.g. distributed cache)?
>>
>> Thanks in advance.
>>
>> Regards
>> Leon
>>
>>
>>
>> 

Reply via email to