GitHub user viirya opened a pull request:
https://github.com/apache/spark/pull/11578
[SPARK-13742][Core] Add non-iterator interface to RandomSampler
JIRA: https://issues.apache.org/jira/browse/SPARK-13742
## What changes were proposed in this pull request?
`RandomSampler.sample` currently accepts iterator as input and output
another iterator. This makes it inappropriate to use in wholestage codegen of
`Sampler` operator #11517. This change is to add non-iterator interface to
`RandomSampler`.
This change adds a new method `def sample(): Int` to the trait
`RandomSampler`. As we don't need to know the actual values of the sampling
items, so this new method takes no arguments.
This method will decide whether to sample the next item or not. It returns
how many times the next item will be sampled.
For `BernoulliSampler` and `BernoulliCellSampler`, the returned sampling
times can only be 0 or 1. It simply means whether to sample the next item or
not.
For `PoissonSampler`, the returned value can be more than 1, meaning the
next item will be sampled multiple times.
## How was this patch tested?
Tests are added into `RandomSamplerSuite`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/viirya/spark-1 random-sampler-no-iterator
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11578.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11578
----
commit 9ef2b9810a623e00a83ca78fd49604cb12b755c7
Author: Liang-Chi Hsieh <[email protected]>
Date: 2016-03-08T07:27:38Z
init import.
commit 4b37774aa52b5120c7d66eeb8f79d9bd16128656
Author: Liang-Chi Hsieh <[email protected]>
Date: 2016-03-08T09:51:58Z
Make it work.
commit 28fca54e3391c2255d43086c58d57c2f0e5b91fb
Author: Liang-Chi Hsieh <[email protected]>
Date: 2016-03-08T13:51:03Z
Add tests.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]