[jira] [Created] (FLINK-7474) Flink supports consuming data from and sink data to Azure eventhubs

2017-08-17 Thread Joe (JIRA)
Joe created FLINK-7474:
--

 Summary: Flink supports consuming data from and sink data to Azure 
eventhubs 
 Key: FLINK-7474
 URL: https://issues.apache.org/jira/browse/FLINK-7474
 Project: Flink
  Issue Type: New Feature
  Components: Streaming Connectors
Affects Versions: 2.0.0, 1.4.0
 Environment: All platforms 
Reporter: Joe
 Fix For: 2.0.0, 1.4.0


Flink-azureeventhubs-connector:

* Consuming data from eventhubs
* Eventhubs offset snapshot and restore 
* RecordWithTimestampAndPeriodicWatermark and 
RecordWithTimestampAndPunctuatedWatermark supported 
* Sink data to eventhubs
* Perf counters: receivedCount, prepareSendCount, commitSendCount




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7473) Possible Leak in GlobalWindows

2017-08-17 Thread Steve Jerman (JIRA)
Steve Jerman created FLINK-7473:
---

 Summary: Possible Leak in GlobalWindows
 Key: FLINK-7473
 URL: https://issues.apache.org/jira/browse/FLINK-7473
 Project: Flink
  Issue Type: Bug
  Components: DataStream API
Affects Versions: 1.3.2
 Environment: See attached project
Reporter: Steve Jerman


Hi,

I have been wrestling with a issue with GlobalWindows. It seems like it leaks 
instances of InternalTimer.

I can't tell if it's a bug or my code so I created a 'minimal' project that has 
the issue...

If you run the Unit Test  in the attached and then monitor heap you will see 
that the number of InternalTimers continually increases. I added code to 
explicitly delete them.. doesn't seem to help.

If I comment out registerEventTimeTimer ... no leak :)

My suspicion is that PURGE/FIRE_AND_PURGE is leaving the timer in limbo.

Steve



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7472) Release task managers gracefully

2017-08-17 Thread Eron Wright (JIRA)
Eron Wright  created FLINK-7472:
---

 Summary: Release task managers gracefully
 Key: FLINK-7472
 URL: https://issues.apache.org/jira/browse/FLINK-7472
 Project: Flink
  Issue Type: Sub-task
Reporter: Eron Wright 


When a task manager is no longer needed (e.g. due to idle timeout in slot 
manager), the RM should gracefully stop it without spurious warnings.   While 
implies some actions should be taken before the TM is actually killed.   
Proactive steps include stopping the heartbeat monitor and sending a disconnect 
message.   

It is unclear whether `RM::closeTaskManagerConnection` method should be called 
proactively (when we plan to kill a TM), reactively (after the TM is killed), 
or both.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7471) Improve bounded OVER support non-retract method AGG

2017-08-17 Thread sunjincheng (JIRA)
sunjincheng created FLINK-7471:
--

 Summary: Improve bounded OVER support non-retract method AGG
 Key: FLINK-7471
 URL: https://issues.apache.org/jira/browse/FLINK-7471
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Reporter: sunjincheng
Assignee: sunjincheng


Currently BOUNDED OVER WINDOW only support have {{retract}} method AGG. In this 
JIRA. will add non-retract method support.
What do you think? [~fhueske]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7470) Acquire RM leadership before registering with Mesos

2017-08-17 Thread Eron Wright (JIRA)
Eron Wright  created FLINK-7470:
---

 Summary: Acquire RM leadership before registering with Mesos
 Key: FLINK-7470
 URL: https://issues.apache.org/jira/browse/FLINK-7470
 Project: Flink
  Issue Type: Sub-task
Reporter: Eron Wright 



Mesos doesn't support fencing tokens in the scheduler protocol; it assumes 
external leader election among scheduler instances.   The last connection wins; 
prior connections for a given framework ID are closed.

The Mesos RM should not register as a framework until it has acquired RM 
leadership.   Evolve the ResourceManager as necessary.   One option is to 
introduce an ResourceManagerRunner that acquires leadership before starting the 
RM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7469) Handle slot requests occuring before RM registration completes

2017-08-17 Thread Eron Wright (JIRA)
Eron Wright  created FLINK-7469:
---

 Summary: Handle slot requests occuring before RM registration 
completes
 Key: FLINK-7469
 URL: https://issues.apache.org/jira/browse/FLINK-7469
 Project: Flink
  Issue Type: Sub-task
Reporter: Eron Wright 
Priority: Minor



Occasionally the TM-to-RM registration ask times out, causing the TM to pause 
registration for 10 seconds.  Meanwhile the registration may actually have 
succeeded in the RM.   Slot requests may then arrive at the TM while RM 
registration is incomplete.   

The current behavior appears to be that the TM honors the slot request.   
Please determine whether this is a feature or a bug.   If a feature, maybe a 
slot request should implicitly complete the registration.

See attached a log showing a certain TM exhibiting the described behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [POLL] Dropping savepoint format compatibility for 1.1.x in the Flink 1.4.0 release

2017-08-17 Thread Stephan Ewen
@Greg - One benefit I can clearly see is the following:

If we keep that old 1.1-style state code, than we want to guarantee its
correctness in the face of the changes that have been made (consolidate
state code to be per-operator rather than per-task in the runtime as well)
and the changes that are WIP, for example for state evolution (like eager
state) or a better failover (don't reload state from DFS if the node did
not actually crash).
Guaranteeing that correctness is a lot of work. I think that not ensuring
the correctness as thoroughly would be worse that removing that code.

The CEP library had not really reached stable state in 1.2 anyways, it was
broken and reworked to introduce features like 'loops' or quantifiers. So
on that side, there would be no 1.2 compatibility anyways.

Do you see any concrete case were dropping 1.1 compatibility breaks any
setups? I personally know no user that is still on 1.1, most were very
eager about 1.2 due to rescalable state.

As someone that has also worked on the state code, I completely understand
Stefan's desire to simplify things there - it really slows down
developments in that component big time right now...


On Thu, Aug 17, 2017 at 1:34 PM, Stefan Richter  wrote:

> I think we are still doing changes for which this is relevant. Also I
> cannot really see a benefit in delaying this because the whole discussion
> will apply in exactly the same way to 1.5.
>
> > Am 17.08.2017 um 13:29 schrieb Greg Hogan :
> >
> > There’s an argument for delaying this change to 1.5 since the feature
> freeze is two weeks away. There is little time to realize benefits from
> removing this code.
> >
> > "The reason for that is that there is a lot of code mapping between the
> completely different legacy format (1.1.x, not re-scalable) and the
> key-group-oriented format (1.2.x onwards, re-scalable). It would greatly
> help the development of state and checkpointing features to drop that old
> code.”
> >
> > Greg
> >
> >
> >> On Aug 17, 2017, at 5:36 AM, Stefan Richter <
> s.rich...@data-artisans.com> wrote:
> >>
> >> One more comment about the consequences of this PR, as pointed out in
> the comments on Github: this will also break direct compatibility for the
> CEP library between Flink 1.2 and 1.4. There is still a way to migrate via
> Flink 1.3: Flink 1.1/2 -> savepoint -> Flink 1.3 -> savepoint -> Flink 1.4.
> >>
> >>> Am 16.08.2017 um 17:31 schrieb Stefan Richter <
> s.rich...@data-artisans.com>:
> >>>
> >>> Hi,
> >>>
> >>> after there have been no objections since a long time, I took the next
> step and created a PR that implements this change in commit
> 95e44099784c9deaf2ca422b8dfc11c3d67d7f82 of https://github.com/apache/
> flink/pull/4550  . Announcing
> this here as a last opportunity for further discussions. FYI, this will
> decrease the code base by almost 12K LOC.
> >>>
> >>> Best,
> >>> Stefan
> >>>
> >>>
>  Am 02.08.2017 um 15:26 schrieb Kostas Kloudas <
> k.klou...@data-artisans.com >:
> 
>  +1
> 
> > On Aug 2, 2017, at 3:16 PM, Till Rohrmann  > wrote:
> >
> > +1
> >
> > On Wed, Aug 2, 2017 at 9:12 AM, Stefan Richter <
> s.rich...@data-artisans.com >
> > wrote:
> >
> >> +1
> >>
> >> Am 28.07.2017 um 16:03 schrieb Stephan Ewen  >:
> >>
> >> Seems like no one raised a concern so far about dropping the
> savepoint
> >> format compatibility for 1.1 in 1.4.
> >>
> >> Leaving this thread open for some more days, but from the
> sentiment, it
> >> seems like we should go ahead?
> >>
> >> On Wed, Jul 12, 2017 at 4:43 PM, Stephan Ewen  > wrote:
> >>
> >>> Hi users!
> >>>
> >>> Flink currently maintains backwards compatibility for savepoint
> formats,
> >>> which means that savepoints taken with Flink version 1.1.x and
> 1.2.x can be
> >>> resumed in Flink 1.3.x
> >>>
> >>> We are discussing how many versions back to support. The
> proposition is
> >>> the following:
> >>>
> >>> *   Suggestion: Flink 1.4.0 will be able to resume savepoints
> taken with
> >>> version 1.3.x and 1.2.x, but not savepoints from version 1.1.x and
> 1.0.x*
> >>>
> >>>
> >>> The reason for that is that there is a lot of code mapping between
> the
> >>> completely different legacy format (1.1.x, not re-scalable) and the
> >>> key-group-oriented format (1.2.x onwards, re-scalable). It would
> greatly
> >>> help the development of state and checkpointing features to drop
> that old
> >>> code.
> >>>
> >>> Please let us know if you have concerns about that.
> >>>
> >>> Best,
> >>> Stephan
> >>>
> >>>
> >>
> 

[jira] [Created] (FLINK-7468) Implement Netty sender backlog logic for credit-based

2017-08-17 Thread zhijiang (JIRA)
zhijiang created FLINK-7468:
---

 Summary: Implement Netty sender backlog logic for credit-based
 Key: FLINK-7468
 URL: https://issues.apache.org/jira/browse/FLINK-7468
 Project: Flink
  Issue Type: Sub-task
  Components: Network
Reporter: zhijiang
Assignee: zhijiang
 Fix For: 1.4.0


This is a part of work for credit-based network flow control.

Receivers should know how many buffers are available on the sender side (the 
backlog). The receivers use this information to decide how to distribute 
floating buffers.

The {{ResultSubpartition}} maintains the backlog which only indicates the 
number of buffers in this subpartition, not including the number of events. The 
backlog is increased for adding buffer to this subpartition, and decreased for 
polling buffer from it.

The backlog is attached in {{BufferResponse}} by sender as an absolute value 
after the buffer being transferred.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [POLL] Dropping savepoint format compatibility for 1.1.x in the Flink 1.4.0 release

2017-08-17 Thread Stefan Richter
I think we are still doing changes for which this is relevant. Also I cannot 
really see a benefit in delaying this because the whole discussion will apply 
in exactly the same way to 1.5.

> Am 17.08.2017 um 13:29 schrieb Greg Hogan :
> 
> There’s an argument for delaying this change to 1.5 since the feature freeze 
> is two weeks away. There is little time to realize benefits from removing 
> this code.
> 
> "The reason for that is that there is a lot of code mapping between the 
> completely different legacy format (1.1.x, not re-scalable) and the 
> key-group-oriented format (1.2.x onwards, re-scalable). It would greatly help 
> the development of state and checkpointing features to drop that old code.”
> 
> Greg
> 
> 
>> On Aug 17, 2017, at 5:36 AM, Stefan Richter  
>> wrote:
>> 
>> One more comment about the consequences of this PR, as pointed out in the 
>> comments on Github: this will also break direct compatibility for the CEP 
>> library between Flink 1.2 and 1.4. There is still a way to migrate via Flink 
>> 1.3: Flink 1.1/2 -> savepoint -> Flink 1.3 -> savepoint -> Flink 1.4.
>> 
>>> Am 16.08.2017 um 17:31 schrieb Stefan Richter :
>>> 
>>> Hi,
>>> 
>>> after there have been no objections since a long time, I took the next step 
>>> and created a PR that implements this change in commit 
>>> 95e44099784c9deaf2ca422b8dfc11c3d67d7f82 of 
>>> https://github.com/apache/flink/pull/4550 
>>>  . Announcing this here as a 
>>> last opportunity for further discussions. FYI, this will decrease the code 
>>> base by almost 12K LOC. 
>>> 
>>> Best,
>>> Stefan
>>> 
>>> 
 Am 02.08.2017 um 15:26 schrieb Kostas Kloudas >:
 
 +1
 
> On Aug 2, 2017, at 3:16 PM, Till Rohrmann  > wrote:
> 
> +1
> 
> On Wed, Aug 2, 2017 at 9:12 AM, Stefan Richter 
> >
> wrote:
> 
>> +1
>> 
>> Am 28.07.2017 um 16:03 schrieb Stephan Ewen > >:
>> 
>> Seems like no one raised a concern so far about dropping the savepoint
>> format compatibility for 1.1 in 1.4.
>> 
>> Leaving this thread open for some more days, but from the sentiment, it
>> seems like we should go ahead?
>> 
>> On Wed, Jul 12, 2017 at 4:43 PM, Stephan Ewen > > wrote:
>> 
>>> Hi users!
>>> 
>>> Flink currently maintains backwards compatibility for savepoint formats,
>>> which means that savepoints taken with Flink version 1.1.x and 1.2.x 
>>> can be
>>> resumed in Flink 1.3.x
>>> 
>>> We are discussing how many versions back to support. The proposition is
>>> the following:
>>> 
>>> *   Suggestion: Flink 1.4.0 will be able to resume savepoints taken with
>>> version 1.3.x and 1.2.x, but not savepoints from version 1.1.x and 
>>> 1.0.x*
>>> 
>>> 
>>> The reason for that is that there is a lot of code mapping between the
>>> completely different legacy format (1.1.x, not re-scalable) and the
>>> key-group-oriented format (1.2.x onwards, re-scalable). It would greatly
>>> help the development of state and checkpointing features to drop that 
>>> old
>>> code.
>>> 
>>> Please let us know if you have concerns about that.
>>> 
>>> Best,
>>> Stephan
>>> 
>>> 
>> 
>> 
 
>>> 
>> 
> 



Re: [POLL] Dropping savepoint format compatibility for 1.1.x in the Flink 1.4.0 release

2017-08-17 Thread Greg Hogan
There’s an argument for delaying this change to 1.5 since the feature freeze is 
two weeks away. There is little time to realize benefits from removing this 
code.

"The reason for that is that there is a lot of code mapping between the 
completely different legacy format (1.1.x, not re-scalable) and the 
key-group-oriented format (1.2.x onwards, re-scalable). It would greatly help 
the development of state and checkpointing features to drop that old code.”

Greg


> On Aug 17, 2017, at 5:36 AM, Stefan Richter  
> wrote:
> 
> One more comment about the consequences of this PR, as pointed out in the 
> comments on Github: this will also break direct compatibility for the CEP 
> library between Flink 1.2 and 1.4. There is still a way to migrate via Flink 
> 1.3: Flink 1.1/2 -> savepoint -> Flink 1.3 -> savepoint -> Flink 1.4.
> 
>> Am 16.08.2017 um 17:31 schrieb Stefan Richter :
>> 
>> Hi,
>> 
>> after there have been no objections since a long time, I took the next step 
>> and created a PR that implements this change in commit 
>> 95e44099784c9deaf2ca422b8dfc11c3d67d7f82 of 
>> https://github.com/apache/flink/pull/4550 
>>  . Announcing this here as a last 
>> opportunity for further discussions. FYI, this will decrease the code base 
>> by almost 12K LOC. 
>> 
>> Best,
>> Stefan
>> 
>> 
>>> Am 02.08.2017 um 15:26 schrieb Kostas Kloudas >> >:
>>> 
>>> +1
>>> 
 On Aug 2, 2017, at 3:16 PM, Till Rohrmann > wrote:
 
 +1
 
 On Wed, Aug 2, 2017 at 9:12 AM, Stefan Richter 
 >
 wrote:
 
> +1
> 
> Am 28.07.2017 um 16:03 schrieb Stephan Ewen  >:
> 
> Seems like no one raised a concern so far about dropping the savepoint
> format compatibility for 1.1 in 1.4.
> 
> Leaving this thread open for some more days, but from the sentiment, it
> seems like we should go ahead?
> 
> On Wed, Jul 12, 2017 at 4:43 PM, Stephan Ewen  > wrote:
> 
>> Hi users!
>> 
>> Flink currently maintains backwards compatibility for savepoint formats,
>> which means that savepoints taken with Flink version 1.1.x and 1.2.x can 
>> be
>> resumed in Flink 1.3.x
>> 
>> We are discussing how many versions back to support. The proposition is
>> the following:
>> 
>> *   Suggestion: Flink 1.4.0 will be able to resume savepoints taken with
>> version 1.3.x and 1.2.x, but not savepoints from version 1.1.x and 1.0.x*
>> 
>> 
>> The reason for that is that there is a lot of code mapping between the
>> completely different legacy format (1.1.x, not re-scalable) and the
>> key-group-oriented format (1.2.x onwards, re-scalable). It would greatly
>> help the development of state and checkpointing features to drop that old
>> code.
>> 
>> Please let us know if you have concerns about that.
>> 
>> Best,
>> Stephan
>> 
>> 
> 
> 
>>> 
>> 
> 



Re: Looking for a Mentor

2017-08-17 Thread Chesnay Schepler

Hello Sumit,

it's great that you want to contribute to Flink!

As a first step, please make sure to go through the How to Contribute 
 guide.
While it is rather large and dry at times it gives a good overview over 
the contribution process and guide-lines.


Next up, have a look at the Flink JIRA 
 and search for issues that 
might interest you.
(Hint: Also check out issues that are already assigned to people but 
haven't made progress in a while)


If you have questions about an issue feel free to ask them in the 
associated JIRA issue.


After giving me your jira username i can also give you contributor 
permissions so you can assign issues to yourself.



Naturally, user and dev experience are important to us, so if you find 
any issue while setting up your dev
environment or trying out Flink (or not even /issues/, but 
inconveniences) please create new JIRA issues.



Looking forward to your contributions,
Chesnay

On 17.08.2017 11:54, Sumit Sarin wrote:

Hey everyone,
I am still looking for any chance to have a mentor to help me contribute to
flink
Regards
Sumit

On Mon, 14 Aug 2017 at 5:05 PM, Sumit Sarin 
wrote:


Hi everyone,
My name is Sumit Sarin and I am new to contributing to open source. I
am from India (Time zone : GMT+5:30). I have worked on a few java
projects before and familiar with using git. I wish to learn from as
well as contribute to the Flink community. I can easily contribute 2-3
hours a day (no time zone restrictions).
It would be wonderful to have a mentor who can guide me through the
process so I can be an asset to Flink.
Thanking You,
Regards,
Sumit





[jira] [Created] (FLINK-7467) a slow client on the BlobServer may block writing operations

2017-08-17 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7467:
--

 Summary: a slow client on the BlobServer may block writing 
operations
 Key: FLINK-7467
 URL: https://issues.apache.org/jira/browse/FLINK-7467
 Project: Flink
  Issue Type: Bug
  Components: Distributed Coordination, Network
Affects Versions: 1.3.2, 1.3.1, 1.3.0, 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


Since FLINK-6020, a locking mechanism was introduced to isolate reading from 
writing operations. This, however, will cause a slow client to slow down all 
write operations and may not be desirable at this level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7466) Add a flink connector for Apache RocketMQ

2017-08-17 Thread yukon (JIRA)
yukon created FLINK-7466:


 Summary: Add a flink connector for Apache RocketMQ
 Key: FLINK-7466
 URL: https://issues.apache.org/jira/browse/FLINK-7466
 Project: Flink
  Issue Type: New Feature
Reporter: yukon


Hi Flink community:

Flink is really a great stream processing framework, I would like to contribute 
a flink-rocketmq-connector, if you think it's acceptable, I will submit a pull 
request soon.

Apache RocketMQ is a distributed messaging and streaming platform with low 
latency, high performance and reliability, trillion-level capacity and flexible 
scalability. More info please refer to http://rocketmq.apache.org/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Looking for a Mentor

2017-08-17 Thread Sumit Sarin
Hey everyone,
I am still looking for any chance to have a mentor to help me contribute to
flink
Regards
Sumit

On Mon, 14 Aug 2017 at 5:05 PM, Sumit Sarin 
wrote:

> Hi everyone,
> My name is Sumit Sarin and I am new to contributing to open source. I
> am from India (Time zone : GMT+5:30). I have worked on a few java
> projects before and familiar with using git. I wish to learn from as
> well as contribute to the Flink community. I can easily contribute 2-3
> hours a day (no time zone restrictions).
> It would be wonderful to have a mentor who can guide me through the
> process so I can be an asset to Flink.
> Thanking You,
> Regards,
> Sumit
>


Re: [POLL] Dropping savepoint format compatibility for 1.1.x in the Flink 1.4.0 release

2017-08-17 Thread Stefan Richter
One more comment about the consequences of this PR, as pointed out in the 
comments on Github: this will also break direct compatibility for the CEP 
library between Flink 1.2 and 1.4. There is still a way to migrate via Flink 
1.3: Flink 1.1/2 -> savepoint -> Flink 1.3 -> savepoint -> Flink 1.4.

> Am 16.08.2017 um 17:31 schrieb Stefan Richter :
> 
> Hi,
> 
> after there have been no objections since a long time, I took the next step 
> and created a PR that implements this change in commit 
> 95e44099784c9deaf2ca422b8dfc11c3d67d7f82 of 
> https://github.com/apache/flink/pull/4550 
>  . Announcing this here as a last 
> opportunity for further discussions. FYI, this will decrease the code base by 
> almost 12K LOC. 
> 
> Best,
> Stefan
> 
> 
>> Am 02.08.2017 um 15:26 schrieb Kostas Kloudas > >:
>> 
>> +1
>> 
>>> On Aug 2, 2017, at 3:16 PM, Till Rohrmann >> > wrote:
>>> 
>>> +1
>>> 
>>> On Wed, Aug 2, 2017 at 9:12 AM, Stefan Richter >> >
>>> wrote:
>>> 
 +1
 
 Am 28.07.2017 um 16:03 schrieb Stephan Ewen >:
 
 Seems like no one raised a concern so far about dropping the savepoint
 format compatibility for 1.1 in 1.4.
 
 Leaving this thread open for some more days, but from the sentiment, it
 seems like we should go ahead?
 
 On Wed, Jul 12, 2017 at 4:43 PM, Stephan Ewen > wrote:
 
> Hi users!
> 
> Flink currently maintains backwards compatibility for savepoint formats,
> which means that savepoints taken with Flink version 1.1.x and 1.2.x can 
> be
> resumed in Flink 1.3.x
> 
> We are discussing how many versions back to support. The proposition is
> the following:
> 
> *   Suggestion: Flink 1.4.0 will be able to resume savepoints taken with
> version 1.3.x and 1.2.x, but not savepoints from version 1.1.x and 1.0.x*
> 
> 
> The reason for that is that there is a lot of code mapping between the
> completely different legacy format (1.1.x, not re-scalable) and the
> key-group-oriented format (1.2.x onwards, re-scalable). It would greatly
> help the development of state and checkpointing features to drop that old
> code.
> 
> Please let us know if you have concerns about that.
> 
> Best,
> Stephan
> 
> 
 
 
>> 
> 



[jira] [Created] (FLINK-7465) Add build-in BloomFilterCount on TableAPI

2017-08-17 Thread sunjincheng (JIRA)
sunjincheng created FLINK-7465:
--

 Summary: Add build-in BloomFilterCount on TableAPI
 Key: FLINK-7465
 URL: https://issues.apache.org/jira/browse/FLINK-7465
 Project: Flink
  Issue Type: Sub-task
  Components: Table API & SQL
Reporter: sunjincheng
Assignee: sunjincheng


In this JIRA. use BloomFilter to implement counting functions.
BloomFilter Algorithm description:
An empty Bloom filter is a bit array of m bits, all set to 0. There must also 
be k different hash functions defined, each of which maps or hashes some set 
element to one of the m array positions, generating a uniform random 
distribution. Typically, k is a constant, much smaller than m, which is 
proportional to the number of elements to be added; the precise choice of k and 
the constant of proportionality of m are determined by the intended false 
positive rate of the filter.

To add an element, feed it to each of the k hash functions to get k array 
positions. Set the bits at all these positions to 1.

To query for an element (test whether it is in the set), feed it to each of the 
k hash functions to get k array positions. If any of the bits at these 
positions is 0, the element is definitely not in the set – if it were, then all 
the bits would have been set to 1 when it was inserted. If all are 1, then 
either the element is in the set, or the bits have by chance been set to 1 
during the insertion of other elements, resulting in a false positive.

An example of a Bloom filter, representing the set {x, y, z}. The colored 
arrows show the positions in the bit array that each set element is mapped to. 
The element w is not in the set {x, y, z}, because it hashes to one bit-array 
position containing 0. For this figure, m = 18 and k = 3. The sketch as follows:
!https://en.wikipedia.org/wiki/Bloom_filter#/media/File:Bloom_filter.svg!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7464) Add useful build-in Aggregate function into TabalAPI

2017-08-17 Thread sunjincheng (JIRA)
sunjincheng created FLINK-7464:
--

 Summary: Add useful build-in Aggregate function into TabalAPI
 Key: FLINK-7464
 URL: https://issues.apache.org/jira/browse/FLINK-7464
 Project: Flink
  Issue Type: New Feature
  Components: Table API & SQL
Reporter: sunjincheng


In this JIRA, will create some sub-task for add specific build-in aggregate 
function, such as FIRST_VALUE, LAST_VALUE, BloomFilterCount etc.

Welcome anybody to add the sub-task.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)