[GitHub] flink pull request: [FLINK-3802] Add Very Fast Reservoir Sampling
GitHub user gaoyike opened a pull request: https://github.com/apache/flink/pull/1924 [FLINK-3802] Add Very Fast Reservoir Sampling Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html). In addition to going through the list, please provide a meaningful description of your changes. - [x] General - The pull request references the related JIRA issue - The pull request addresses only one issue - Each commit in the PR has a meaningful commit message - [x] Documentation - Documentation has been added for new functionality - Old documentation affected by the pull request has been updated - JavaDoc for public methods has been added - [ ] Tests & Build - Functionality added by the pull request is covered by tests - `mvn clean verify` has been executed successfully locally or a Travis build has passed A in memory implementation of Very Fast Reservoir Sampling, the algorithm works well then the size of streaming data is much larger than size of reservoir. The algorithm runs in random sampling with P(R/j) where in R is the size of sampling and j is the current index of streaming data. The algorithm consists of two part: (1) Before the size of streaming data reaches threshold, it uses regular reservoir sampling (2) After the size of streaming data reaches threshold, it uses geometric distribution to generate the approximation gap to skip data, and size of gap is determined by geometric distribution with probability p = R/j Thanks to Erik Erlandson who is the author of this algorithm and help me with implementation. Reference: http://erikerlandson.github.io/blog/2015/11/20/very-fast-reservoir-sampling/ You can merge this pull request into a Git repository by running: $ git pull https://github.com/gaoyike/flink master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1924.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1924 commit 81e0622b20d8bc969dec1555bd55d4230d9b38de Author: æ¨å ä½ <gaoy...@gmail.com> Date: 2016-04-21T21:42:26Z A in memory implementation of Very Fast Reservoir Sampling. The algorithm works well then the size of streaming data is much larger than size of reservoir. The algorithm runs in random sampling with P(R/j) where in R is the size of sampling and j is the current index of streaming data. The algorithm consists of two part: (1) Before the size of streaming data reaches threshold, it uses regular reservoir sampling (2) After the size of streaming data reaches threshold, it uses geometric distribution to generate the approximation gap to skip data, and size of gap is determined by geometric distribution with probability p = R/j Thanks to Erik Erlandson who is the author of this algorithm and help me with implementation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3781] BlobClient may be left unclosed i...
Github user gaoyike commented on the pull request: https://github.com/apache/flink/pull/1908#issuecomment-212460975 Done! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3781] BlobClient may be left unclosed i...
Github user gaoyike commented on the pull request: https://github.com/apache/flink/pull/1908#issuecomment-212011148 Updated! Thanks uce! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3783] [core] Support weighted random sa...
Github user gaoyike commented on the pull request: https://github.com/apache/flink/pull/1909#issuecomment-211930810 What is the core algorithm A-ES or A-Chao? 2016-04-19 6:41 GMT-05:00 Trevor Grant <notificati...@github.com>: > Nice. If this gets merged before #1898 > <https://github.com/apache/flink/pull/1898> I'll integrate it in. > Otherwise I'll open a seperate PR after. > > â > You are receiving this because you are subscribed to this thread. > Reply to this email directly or view it on GitHub > <https://github.com/apache/flink/pull/1909#issuecomment-211873070> > --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3781] BlobClient may be left unclosed i...
GitHub user gaoyike opened a pull request: https://github.com/apache/flink/pull/1908 [FLINK-3781] BlobClient may be left unclosed in BlobCache#deleteGlobal() Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html). In addition to going through the list, please provide a meaningful description of your changes. - [x] General - The pull request references the related JIRA issue - The pull request addresses only one issue - Each commit in the PR has a meaningful commit message - [ ] Documentation - Documentation has been added for new functionality - Old documentation affected by the pull request has been updated - JavaDoc for public methods has been added - [ ] Tests & Build - Functionality added by the pull request is covered by tests - `mvn clean verify` has been executed successfully locally or a Travis build has passed - Sorry for previous wrong pull-request, i forget to rebase my git You can merge this pull request into a Git repository by running: $ git pull https://github.com/gaoyike/flink master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1908.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1908 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3781] BlobClient may be left unclosed i...
Github user gaoyike closed the pull request at: https://github.com/apache/flink/pull/1907 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3781] BlobClient may be left unclosed i...
GitHub user gaoyike opened a pull request: https://github.com/apache/flink/pull/1907 [FLINK-3781] BlobClient may be left unclosed in BlobCache#deleteGlobal() Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html). In addition to going through the list, please provide a meaningful description of your changes. - [x] General - The pull request references the related JIRA issue - The pull request addresses only one issue - Each commit in the PR has a meaningful commit message - [ ] Documentation - Documentation has been added for new functionality - Old documentation affected by the pull request has been updated - JavaDoc for public methods has been added - [ ] Tests & Build - Functionality added by the pull request is covered by tests - `mvn clean verify` has been executed successfully locally or a Travis build has passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/flink release-1.0.2-rc3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1907.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1907 commit f3c6646e68750a068b3325181b8a16a4689a0fed Author: Stephan Ewen <se...@apache.org> Date: 2016-02-22T17:37:59Z [hotfix] Make DataStream property methods properly Scalaesk This also includes some minor cleanups This closes #1689 commit df19a8bf908a21fc35830c08cc61d8d0566813eb Author: Ufuk Celebi <u...@apache.org> Date: 2016-02-26T11:46:07Z [FLINK-3390] [runtime, tests] Restore savepoint path on ExecutionGraph restart Temporary work around to restore initial state on failure during recovery as required by a user. Will be superseded by FLINK-3397 with better handling of checkpoint and savepoint restoring. A failure during recovery resulted in restarting a job without its savepoint state. This temporary work around makes sure that if the savepoint coordinator ever restored a savepoint and there was no checkpoint after the savepoint, the savepoint state will be restored again. This closes #1720. commit 8c3301501934ee0faeffec3f8c2034d292d078ef Author: Stephan Ewen <se...@apache.org> Date: 2016-02-26T14:12:07Z [FLINK-3522] [storm compat] PrintSampleStream prints a proper message when involked without arguments commit c0bc8bcf7e1c3ac1f50c3038456f5af888392a06 Author: Till Rohrmann <trohrm...@apache.org> Date: 2016-02-26T10:57:21Z [hotfix] [build] Disable exclusion rules when using build-jar maven profile. This closes #1719 commit 2c605d275b26793d8676e35b6ccc5102bdcbf30d Author: Till Rohrmann <trohrm...@apache.org> Date: 2016-02-26T13:08:02Z [FLINK-3511] [gelly] Introduce flink-gelly-examples module The new flink-gelly-examples module contains all Java and Scala Gelly examples. The module contains compile scope dependencies on flink-java, flink-scala and flink-clients so that the examples can be conveniently run from within the IDE. commit 0601a762a4ee826bc628842e9b38f205fafdb76d Author: Stephan Ewen <se...@apache.org> Date: 2016-02-26T14:34:06Z [hotfix] Remove remaining unused files from the old standalone web client commit 044479230e984b130f018930adaceb661c9aa80b Author: Till Rohrmann <trohrm...@apache.org> Date: 2016-02-26T14:57:45Z [FLINK-3511] [avro] Move avro examples to test scope commit f2de20b02bef66f437164e24e9fc0084530d4b01 Author: Stephan Ewen <se...@apache.org> Date: 2016-02-26T17:19:27Z [FLINK-3525] [runtime] Fix call to super.close() in TimestampsAndPeriodicWatermarksOperator commit 51ab77b16a994f2f511e34bb37f9c2294a234e50 Author: Stephan Ewen <se...@apache.org> Date: 2016-02-26T17:31:32Z [license] Update LICENSE file for the latest version commit 131f016e71540a5d1e264084c630b93de1aeabae Author: Till Rohrmann <trohrm...@apache.org> Date: 2016-02-26T15:12:59Z [FLINK-3511] [hadoop-compatibility] Move hadoop-compatibility examples to test scope commit 434cff00fd7fdc41dfb14f729888abaf12af1f7d Author: Till Rohrmann <trohrm...@apache.org> Date: 2016-02-26T15:15:44Z [FLINK-3511] [jdbc] Move jdbc examples to test scope and add flink-clients dependency commit 0dc824080f38d83d9a748d19d04344c3bf4d7077 Author: Till Rohrmann <trohrm...@apache.org> Date: 2016-02-26T15:21:13Z [FLINK-3511] [nifi