Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/1866#discussion_r16032164
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala
---
@@ -133,68 +133,64 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V)])
* Return a subset of this RDD sampled by key (via stratified sampling).
*
* Create a sample of this RDD using variable sampling rates for
different keys as specified by
- * `fractions`, a key to sampling rate map.
- *
- * If `exact` is set to false, create the sample via simple random
sampling, with one pass
- * over the RDD, to produce a sample of size that's approximately equal
to the sum of
- * math.ceil(numItems * samplingRate) over all key values; otherwise,
use additional passes over
- * the RDD to create a sample size that's exactly equal to the sum of
+ * `fractions`, a key to sampling rate map, via simple random sampling
with one pass over the
+ * RDD, to produce a sample of size that's approximately equal to the
sum of
* math.ceil(numItems * samplingRate) over all key values.
*/
def sampleByKey(withReplacement: Boolean,
fractions: JMap[K, Double],
- exact: Boolean,
seed: Long): JavaPairRDD[K, V] =
- new JavaPairRDD[K, V](rdd.sampleByKey(withReplacement, fractions,
exact, seed))
+ new JavaPairRDD[K, V](rdd.sampleByKey(withReplacement, fractions,
seed))
/**
* Return a subset of this RDD sampled by key (via stratified sampling).
*
* Create a sample of this RDD using variable sampling rates for
different keys as specified by
- * `fractions`, a key to sampling rate map.
- *
- * If `exact` is set to false, create the sample via simple random
sampling, with one pass
- * over the RDD, to produce a sample of size that's approximately equal
to the sum of
- * math.ceil(numItems * samplingRate) over all key values; otherwise,
use additional passes over
- * the RDD to create a sample size that's exactly equal to the sum of
+ * `fractions`, a key to sampling rate map, via simple random sampling
with one pass over the
+ * RDD, to produce a sample of size that's approximately equal to the
sum of
* math.ceil(numItems * samplingRate) over all key values.
*
- * Use Utils.random.nextLong as the default seed for the random number
generator
+ * Use Utils.random.nextLong as the default seed for the random number
generator.
*/
def sampleByKey(withReplacement: Boolean,
- fractions: JMap[K, Double],
- exact: Boolean): JavaPairRDD[K, V] =
- sampleByKey(withReplacement, fractions, exact, Utils.random.nextLong)
+ fractions: JMap[K, Double]): JavaPairRDD[K, V] =
+ sampleByKey(withReplacement, fractions, Utils.random.nextLong)
/**
- * Return a subset of this RDD sampled by key (via stratified sampling).
+ * ::Experimental::
*
--- End diff --
Please remove this line so both `:: Experimental ::` and the first sentence
show up in the summary of the generated doc. Otherwise, only `:: Experimental
::` appears in the summary.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]