[jira] [Commented] (FLINK-3783) Support weighted random sampling with reservoir

2016-04-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249721#comment-15249721
 ] 

ASF GitHub Bot commented on FLINK-3783:
---

Github user gallenvara commented on the pull request:

https://github.com/apache/flink/pull/1909#issuecomment-212397350
  
@gaoyike A-ES algorithm is a weighted random sampling method with 
reservoir. It can create a sampler with defined size. And the probability of 
element distribution is the same as expected.


> Support weighted random sampling with reservoir
> ---
>
> Key: FLINK-3783
> URL: https://issues.apache.org/jira/browse/FLINK-3783
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>
> In default random sampling, all items have the same probability to be 
> selected. But in weighted random sampling, the probability of each item to be 
> selected is determined by its weight with respect to the weights of the other 
> items.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3783) Support weighted random sampling with reservoir

2016-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247774#comment-15247774
 ] 

ASF GitHub Bot commented on FLINK-3783:
---

Github user gaoyike commented on the pull request:

https://github.com/apache/flink/pull/1909#issuecomment-211930810
  
What is the core algorithm A-ES or A-Chao?


2016-04-19 6:41 GMT-05:00 Trevor Grant :

> Nice. If this gets merged before #1898
>  I'll integrate it in.
> Otherwise I'll open a seperate PR after.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly or view it on GitHub
> 
>



> Support weighted random sampling with reservoir
> ---
>
> Key: FLINK-3783
> URL: https://issues.apache.org/jira/browse/FLINK-3783
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>
> In default random sampling, all items have the same probability to be 
> selected. But in weighted random sampling, the probability of each item to be 
> selected is determined by its weight with respect to the weights of the other 
> items.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3783) Support weighted random sampling with reservoir

2016-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247597#comment-15247597
 ] 

ASF GitHub Bot commented on FLINK-3783:
---

Github user rawkintrevo commented on the pull request:

https://github.com/apache/flink/pull/1909#issuecomment-211873070
  
Nice.  If this gets merged before https://github.com/apache/flink/pull/1898 
I'll integrate it in. Otherwise I'll open a seperate PR after. 


> Support weighted random sampling with reservoir
> ---
>
> Key: FLINK-3783
> URL: https://issues.apache.org/jira/browse/FLINK-3783
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>
> In default random sampling, all items have the same probability to be 
> selected. But in weighted random sampling, the probability of each item to be 
> selected is determined by its weight with respect to the weights of the other 
> items.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3783) Support weighted random sampling with reservoir

2016-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247254#comment-15247254
 ] 

ASF GitHub Bot commented on FLINK-3783:
---

GitHub user gallenvara opened a pull request:

https://github.com/apache/flink/pull/1909

[FLINK-3783] [core] Support weighted random sampling with reservoir.

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [X] General
  - The pull request references the related JIRA issue
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message

- [X] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [X] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed

In default random sampling, all items have the same probability to be 
selected. But in weighted random sampling, the probability of each item to be 
selected is determined by its weight with respect to the weights of the other 
items. This is reference paper: http://arxiv.org/pdf/1012.0256.pdf. The test of 
WRS is defining 10 items with different probability and counting every item in 
the reservoir to get the probability after sampling and comparing with expected 
probabilities to verify its correctness. Also, the paper describe method with 
exponential jumps, i think we can implement it in future.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gallenvara/flink WeightedRandomSample

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1909.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1909


commit 33b961602dd1d289a979869250b88a43c68ba9ab
Author: gallenvara 
Date:   2016-04-18T08:50:20Z

Support weighted random sampler.




> Support weighted random sampling with reservoir
> ---
>
> Key: FLINK-3783
> URL: https://issues.apache.org/jira/browse/FLINK-3783
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>
> In default random sampling, all items have the same probability to be 
> selected. But in weighted random sampling, the probability of each item to be 
> selected is determined by its weight with respect to the weights of the other 
> items.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)