Ok. The state persistence concurrency issue Imentioned below in Flink Cluster 
is finally resolved.I tried everything with Redis. There is a 
concurrency/locking issue which I couldnt figure it out.Replaced Redis with 
simple Java 1.7 ConcurrentHashMaps only for key "state" values...magic!! J
And you get a bonus as well: It gets REALLY Fast...Thanks+regardsAmir-      
From: Raghu Angadi <[email protected]>
 To: amir bahmanyari <[email protected]> 
Cc: "[email protected]" <[email protected]>
 Sent: Wednesday, July 27, 2016 10:16 AM
 Subject: Re: Avoid reading duplicates from KafkaIO!!!
   
Thanks for reporting your findings back.
I don't know many details of your app, but I strongly suggest you do as many 
aggregations as possible in Dataflow itself rather than using an external 
storage. This will be lot more scalable and more importantly much simpler to 
manage and make changes in future. 
Raghu.
On Wed, Jul 27, 2016 at 10:06 AM, amir bahmanyari <[email protected]> wrote:

Hi Raghu,Kafka is ok sending one record at a time. I proved it.The issue is 
maintaining concurrency in Redis maps thats being accessed+modified by parallel 
Pipelines at runtime.And the variability of the apparent duplication is due to 
unpredictable number of the parallel pipelines accessing a shared Redis map 
during each separate run.Thats my challenge at the moment.Thanks for your 
help.Amir

      From: Raghu Angadi <[email protected]>
 To: [email protected]; amir bahmanyari <[email protected]> 
 Sent: Tuesday, July 26, 2016 2:34 PM
 Subject: Re: Avoid reading duplicates from KafkaIO!!!
   

On Tue, Jul 26, 2016 at 2:29 PM, amir bahmanyari <[email protected]> wrote:

I know this is not Kafka forum ,but could Kafka be sending redundant records?

no, equally unlikely. You could verify if you pushed duplicates to Kafka by 
reading directly from kafka console consumer.
e.g. :  bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic 
my_topi --from-beginning | grep ....

   



  

Reply via email to