Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2607#discussion_r19495258
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impl/BaggedPoint.scala ---
@@ -46,20 +47,63 @@ private[tree] object BaggedPoint {
* Convert an input dataset into its BaggedPoint representation,
* choosing subsample counts for each instance.
* Each subsample has the same number of instances as the original
dataset,
- * and is created by subsampling with replacement.
- * @param input Input dataset.
- * @param numSubsamples Number of subsamples of this RDD to take.
- * @param seed Random seed.
- * @return BaggedPoint dataset representation
+ * and is created by subsampling without replacement.
+ * @param input Input dataset.
+ * @param subsample Fraction of the training data used for learning
decision tree.
+ * @param numSubsamples Number of subsamples of this RDD to take.
+ * @param withReplacement Sampling with/without replacement.
+ * @param seed Random seed.
+ * @return BaggedPoint dataset representation.
*/
- def convertToBaggedRDD[Datum](
+ def convertToBaggedRDD[Datum] (
input: RDD[Datum],
+ subsample: Double,
--- End diff --
Rename: subsample --> subsamplingRate
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]