[jira] [Updated] (FLINK-3802) Add Very Fast Reservoir Sampling
[ https://issues.apache.org/jira/browse/FLINK-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-3802: -- Labels: Sampling pull-request-available (was: Sampling) > Add Very Fast Reservoir Sampling > > > Key: FLINK-3802 > URL: https://issues.apache.org/jira/browse/FLINK-3802 > Project: Flink > Issue Type: Improvement > Components: Library / Machine Learning >Reporter: Chenguang He >Assignee: Chenguang He >Priority: Major > Labels: Sampling, pull-request-available > > Adding Very Fast Reservoir Sampling > (http://erikerlandson.github.io/blog/2015/11/20/very-fast-reservoir-sampling/) > An improved version of Reservoir Sampling, it's used to deal with small > sampling in large dataset, where the size of dataset is much larger than the > size of sampling. > It is a random sampling proved in the link. The average possibility is > P(R/J), where R is size of sampling and J is index of streaming data > Thanks Erik Erlandson who is the author of this algorithm help me with > implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-3802) Add Very Fast Reservoir Sampling
[ https://issues.apache.org/jira/browse/FLINK-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-3802: -- Component/s: (was: Java API) Library / Machine Learning > Add Very Fast Reservoir Sampling > > > Key: FLINK-3802 > URL: https://issues.apache.org/jira/browse/FLINK-3802 > Project: Flink > Issue Type: Improvement > Components: Library / Machine Learning >Reporter: Chenguang He >Assignee: Chenguang He >Priority: Major > Labels: Sampling > > Adding Very Fast Reservoir Sampling > (http://erikerlandson.github.io/blog/2015/11/20/very-fast-reservoir-sampling/) > An improved version of Reservoir Sampling, it's used to deal with small > sampling in large dataset, where the size of dataset is much larger than the > size of sampling. > It is a random sampling proved in the link. The average possibility is > P(R/J), where R is size of sampling and J is index of streaming data > Thanks Erik Erlandson who is the author of this algorithm help me with > implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-3802) Add Very Fast Reservoir Sampling
[ https://issues.apache.org/jira/browse/FLINK-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chenguang He updated FLINK-3802: Description: Adding Very Fast Reservoir Sampling (http://erikerlandson.github.io/blog/2015/11/20/very-fast-reservoir-sampling/) An improved version of Reservoir Sampling, it's used to deal with small sampling in large dataset, where the size of dataset is much larger than the size of sampling. It is a random sampling proved in the link. The average possibility is P(R/J), where R is size of sampling and J is index of streaming data Thanks Erik Erlandson who is the author of this algorithm help me with implementation. was: Adding Very Fast Reservoir Sampling (http://erikerlandson.github.io/blog/2015/11/20/very-fast-reservoir-sampling/) An improvement version of Reservoir Sampling, it's used to deal with small sampling in large dataset, where the set of dataset is much larger than the size of sampling. It is a random sampling proved in the link. The average possibility is P(R/J), where R is size of sampling and J is index of streaming data Thanks Erik Erlandson who is the author of this algorithm help me with implementation. > Add Very Fast Reservoir Sampling > > > Key: FLINK-3802 > URL: https://issues.apache.org/jira/browse/FLINK-3802 > Project: Flink > Issue Type: Improvement > Components: Java API >Reporter: Chenguang He >Assignee: Chenguang He > Labels: Sampling > > Adding Very Fast Reservoir Sampling > (http://erikerlandson.github.io/blog/2015/11/20/very-fast-reservoir-sampling/) > An improved version of Reservoir Sampling, it's used to deal with small > sampling in large dataset, where the size of dataset is much larger than the > size of sampling. > It is a random sampling proved in the link. The average possibility is > P(R/J), where R is size of sampling and J is index of streaming data > Thanks Erik Erlandson who is the author of this algorithm help me with > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)