[GitHub] flink pull request: [FLINK-3802] Add Very Fast Reservoir Sampling

2016-04-21 Thread gaoyike
GitHub user gaoyike opened a pull request:

https://github.com/apache/flink/pull/1924

[FLINK-3802] Add Very Fast Reservoir Sampling

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [x] General
  - The pull request references the related JIRA issue
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message

- [x] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed



A in memory implementation of Very Fast Reservoir Sampling, the algorithm 
works well then the size of streaming data is much larger than size of 
reservoir.

  The algorithm runs in random sampling with P(R/j) where in R is the size 
of sampling and j is the current index of streaming data.
  The algorithm consists of two part:
(1) Before the size of streaming data reaches threshold, it uses 
regular reservoir sampling
(2) After the size of streaming data reaches threshold, it uses 
geometric distribution to generate the approximation gap
to skip data, and size of gap is determined by  geometric 
distribution with probability p = R/j

   Thanks to Erik Erlandson who is the author of this algorithm and help me 
with implementation.

Reference: 
http://erikerlandson.github.io/blog/2015/11/20/very-fast-reservoir-sampling/

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gaoyike/flink master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1924.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1924


commit 81e0622b20d8bc969dec1555bd55d4230d9b38de
Author: 晨光 何 <gaoy...@gmail.com>
Date:   2016-04-21T21:42:26Z

 A in memory implementation of Very Fast Reservoir Sampling. The algorithm 
works well then the size of streaming data is much larger than size of 
reservoir.
  The algorithm runs in random sampling with P(R/j) where in R is the size 
of sampling and j is the current index of streaming data.
  The algorithm consists of two part:
(1) Before the size of streaming data reaches threshold, it uses 
regular reservoir sampling
(2) After the size of streaming data reaches threshold, it uses 
geometric distribution to generate the approximation gap
to skip data, and size of gap is determined by  geometric 
distribution with probability p = R/j

   Thanks to Erik Erlandson who is the author of this algorithm and help me 
with implementation.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3781] BlobClient may be left unclosed i...

2016-04-20 Thread gaoyike
Github user gaoyike commented on the pull request:

https://github.com/apache/flink/pull/1908#issuecomment-212460975
  
Done!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3781] BlobClient may be left unclosed i...

2016-04-19 Thread gaoyike
Github user gaoyike commented on the pull request:

https://github.com/apache/flink/pull/1908#issuecomment-212011148
  
Updated!

Thanks uce!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3783] [core] Support weighted random sa...

2016-04-19 Thread gaoyike
Github user gaoyike commented on the pull request:

https://github.com/apache/flink/pull/1909#issuecomment-211930810
  
What is the core algorithm A-ES or A-Chao?


2016-04-19 6:41 GMT-05:00 Trevor Grant <notificati...@github.com>:

> Nice. If this gets merged before #1898
> <https://github.com/apache/flink/pull/1898> I'll integrate it in.
> Otherwise I'll open a seperate PR after.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/flink/pull/1909#issuecomment-211873070>
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3781] BlobClient may be left unclosed i...

2016-04-18 Thread gaoyike
GitHub user gaoyike opened a pull request:

https://github.com/apache/flink/pull/1908

[FLINK-3781] BlobClient may be left unclosed in BlobCache#deleteGlobal()

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [x] General
  - The pull request references the related JIRA issue
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


-

Sorry for previous wrong pull-request, i forget to rebase my git



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gaoyike/flink master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1908.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1908






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3781] BlobClient may be left unclosed i...

2016-04-18 Thread gaoyike
Github user gaoyike closed the pull request at:

https://github.com/apache/flink/pull/1907


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3781] BlobClient may be left unclosed i...

2016-04-18 Thread gaoyike
GitHub user gaoyike opened a pull request:

https://github.com/apache/flink/pull/1907

[FLINK-3781] BlobClient may be left unclosed in BlobCache#deleteGlobal()

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [x] General
  - The pull request references the related JIRA issue
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/flink release-1.0.2-rc3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1907.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1907


commit f3c6646e68750a068b3325181b8a16a4689a0fed
Author: Stephan Ewen <se...@apache.org>
Date:   2016-02-22T17:37:59Z

[hotfix] Make DataStream property methods properly Scalaesk

This also includes some minor cleanups

This closes #1689

commit df19a8bf908a21fc35830c08cc61d8d0566813eb
Author: Ufuk Celebi <u...@apache.org>
Date:   2016-02-26T11:46:07Z

[FLINK-3390] [runtime, tests] Restore savepoint path on ExecutionGraph 
restart

Temporary work around to restore initial state on failure during recovery as
required by a user. Will be superseded by FLINK-3397 with better handling of
checkpoint and savepoint restoring.

A failure during recovery resulted in restarting a job without its savepoint
state. This temporary work around makes sure that if the savepoint 
coordinator
ever restored a savepoint and there was no checkpoint after the savepoint,
the savepoint state will be restored again.

This closes #1720.

commit 8c3301501934ee0faeffec3f8c2034d292d078ef
Author: Stephan Ewen <se...@apache.org>
Date:   2016-02-26T14:12:07Z

[FLINK-3522] [storm compat] PrintSampleStream prints a proper message when 
involked without arguments

commit c0bc8bcf7e1c3ac1f50c3038456f5af888392a06
Author: Till Rohrmann <trohrm...@apache.org>
Date:   2016-02-26T10:57:21Z

[hotfix] [build] Disable exclusion rules when using build-jar maven profile.

This closes #1719

commit 2c605d275b26793d8676e35b6ccc5102bdcbf30d
Author: Till Rohrmann <trohrm...@apache.org>
Date:   2016-02-26T13:08:02Z

[FLINK-3511] [gelly] Introduce flink-gelly-examples module

The new flink-gelly-examples module contains all Java and Scala Gelly 
examples. The module
contains compile scope dependencies on flink-java, flink-scala and 
flink-clients so that
the examples can be conveniently run from within the IDE.

commit 0601a762a4ee826bc628842e9b38f205fafdb76d
Author: Stephan Ewen <se...@apache.org>
Date:   2016-02-26T14:34:06Z

[hotfix] Remove remaining unused files from the old standalone web client

commit 044479230e984b130f018930adaceb661c9aa80b
Author: Till Rohrmann <trohrm...@apache.org>
Date:   2016-02-26T14:57:45Z

[FLINK-3511] [avro] Move avro examples to test scope

commit f2de20b02bef66f437164e24e9fc0084530d4b01
Author: Stephan Ewen <se...@apache.org>
Date:   2016-02-26T17:19:27Z

[FLINK-3525] [runtime] Fix call to super.close() in 
TimestampsAndPeriodicWatermarksOperator

commit 51ab77b16a994f2f511e34bb37f9c2294a234e50
Author: Stephan Ewen <se...@apache.org>
Date:   2016-02-26T17:31:32Z

[license] Update LICENSE file for the latest version

commit 131f016e71540a5d1e264084c630b93de1aeabae
Author: Till Rohrmann <trohrm...@apache.org>
Date:   2016-02-26T15:12:59Z

[FLINK-3511] [hadoop-compatibility] Move hadoop-compatibility examples to 
test scope

commit 434cff00fd7fdc41dfb14f729888abaf12af1f7d
Author: Till Rohrmann <trohrm...@apache.org>
Date:   2016-02-26T15:15:44Z

[FLINK-3511] [jdbc] Move jdbc examples to test scope and add flink-clients 
dependency

commit 0dc824080f38d83d9a748d19d04344c3bf4d7077
Author: Till Rohrmann <trohrm...@apache.org>
Date:   2016-02-26T15:21:13Z

[FLINK-3511] [nifi