Re: [PROPOSAL] Introduce Elastic Bloom Filter For Flink

2018-07-05 Thread sihua zhou
Hi Stephan, Thank you very much for the reply and very happy for that! I'm not sure whether I understood your idea correctly. Does it means 1) we should add a new operator with the feature of Elastic Bloom Filter? or 2) we could support it as the current (version <= 1.5)

Re: [PROPOSAL] Introduce Elastic Bloom Filter For Flink

2018-07-04 Thread Stephan Ewen
Hi Sihua! Sorry for joining this discussion late. I can see the benefit of such a feature and also see the technical merit. It is a nice piece of work and a good proposal. I am wondering if there is a way to add such a technique as a "library operator", or whether it needs a deep integration

Re: [PROPOSAL] Introduce Elastic Bloom Filter For Flink

2018-06-12 Thread sihua zhou
Hi, Maybe I would like to add more information concerning to the Linked Filter Nodes on each key group. The reason that we need to maintance a Linked Filter Nodes is that we need to handle data skew, data skew is also the most challenging problem that we need to overcome. Because we don't

Re: [PROPOSAL] Introduce Elastic Bloom Filter For Flink

2018-06-12 Thread sihua zhou
Hi Fabian, Thanks a lot for your reply, you are right that users would need to configure a TTL for the Elastic Filter to recycle the memory resource. For every Linked BloomFilter Nodes, only the head node is writable, the other nodes are all full, they are only immutable(only readable, we

Re: [PROPOSAL] Introduce Elastic Bloom Filter For Flink

2018-06-12 Thread Fabian Hueske
Hi Sihua, Sorry for not replying earlier. I have one question left. If I understood the design of the linked Bloomfilter nodes right, users would need to configure a TTL to be able to remove a node. When nodes are removed, we would need to insert every key into the current node which would not be

Re: [PROPOSAL] Introduce Elastic Bloom Filter For Flink

2018-06-12 Thread sihua zhou
Hi, no more feedbacks these days...I guess it's because you guys are too busy and since I didn't receive any negative feedbacks and there're already some positive feedbacks. So I want to implement this *Elastic Bloom Filter* based on the current design doc(because I have time to do it

Re: [PROPOSAL] Introduce Elastic Bloom Filter For Flink

2018-05-29 Thread sihua zhou
Hi, I did a survey of the variants of Bloom Filter and the Cuckoo filter these days. Finally, I found 3 of them maybe adaptable for our purpose. 1. standard bloom filter (which we have implemented base on this and used it on production with a good experience) 2. cuckoo filter, also a very

Re: [PROPOSAL] Introduce Elastic Bloom Filter For Flink

2018-05-23 Thread Elias Levy
I would suggest you consider an alternative data structures: a Cuckoo Filter or a Golumb Compressed Sequence. The GCS data structure was introduced in Cache-, Hash- and Space-Efficient Bloom Filters by F. Putze, P. Sanders,

Re: [PROPOSAL] Introduce Elastic Bloom Filter For Flink

2018-05-23 Thread sihua zhou
sues.apache.org/jira/browse/FLINK-8918> 2018-05-23 9:56 GMT+02:00 sihua zhou <summerle...@163.com <mailto:summerle...@163.com>>: Hi Devs! I proposal to introduce "Elastic Bloom Filter" for Flink, the reason I make up this proposal is that, it helped us a lot on production

Re: [PROPOSAL] Introduce Elastic Bloom Filter For Flink

2018-05-23 Thread Stefan Richter
emory > boundaries over time? > > Best, > Fabian > > [1] https://issues.apache.org/jira/browse/FLINK-8918 > <https://issues.apache.org/jira/browse/FLINK-8918> > > 2018-05-23 9:56 GMT+02:00 sihua zhou <summerle...@163.com > <mailto:summerle...@163.com>

Re: [PROPOSAL] Introduce Elastic Bloom Filter For Flink

2018-05-23 Thread Fabian Hueske
/browse/FLINK-8918 2018-05-23 9:56 GMT+02:00 sihua zhou <summerle...@163.com>: > Hi Devs! > I proposal to introduce "Elastic Bloom Filter" for Flink, the reason I > make up this proposal is that, it helped us a lot on production, it let's > improve the performan

[PROPOSAL] Introduce Elastic Bloom Filter For Flink

2018-05-23 Thread sihua zhou
Hi Devs! I proposal to introduce "Elastic Bloom Filter" for Flink, the reason I make up this proposal is that, it helped us a lot on production, it let's improve the performance with reducing consumption of resources. Here is a brief description fo the motivation of why it's so po