Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/10601#discussion_r53891753
--- Diff: python/pyspark/mllib/tree.py ---
@@ -483,28 +519,35 @@ def trainClassifier(cls, data,
categoricalFeaturesInfo,
Method to train a gradient-boosted trees model for
classification.
- :param data: Training dataset: RDD of LabeledPoint.
- Labels should take values {0, 1}.
- :param categoricalFeaturesInfo: Map storing arity of categorical
- features. E.g., an entry (n -> k) indicates that feature
- n is categorical with k categories indexed from 0:
- {0, 1, ..., k-1}.
- :param loss: Loss function used for minimization during gradient
- boosting. Supported: {"logLoss" (default),
- "leastSquaresError", "leastAbsoluteError"}.
- :param numIterations: Number of iterations of boosting.
- (default: 100)
- :param learningRate: Learning rate for shrinking the
- contribution of each estimator. The learning rate
- should be between in the interval (0, 1].
- (default: 0.1)
- :param maxDepth: Maximum depth of the tree. E.g., depth 0 means
- 1 leaf node; depth 1 means 1 internal node + 2 leaf
- nodes. (default: 3)
- :param maxBins: maximum number of bins used for splitting
- features (default: 32) DecisionTree requires maxBins >=
max categories
- :return: GradientBoostedTreesModel that can be used for
- prediction
+ :param data:
+ Training dataset: RDD of LabeledPoint. Labels should take values
+ {0, 1}.
+ :param categoricalFeaturesInfo:
+ Map storing arity of categorical features. E.g., an entry (n ->
+ k) indicates that feature n is categorical with k categories
+ indexed from 0: {0, 1, ..., k-1}.
+ :param loss:
+ Loss function used for minimization during gradient boosting.
+ Supported values: {"logLoss", "leastSquaresError",
--- End diff --
If there's an agreed and Sphinx-required format for the longer version
(with descriptions for each supported value) we should use that. I think
the format 'Supported values: "foo", "bar".' is ok for the simpler version.
I'm slightly more in favor of "supported" rather than "allowed", but have
no strong opinion either way - it should be consistent though.
On Tue, 23 Feb 2016 at 19:57, Bryan Cutler <[email protected]> wrote:
> In python/pyspark/mllib/tree.py
> <https://github.com/apache/spark/pull/10601#discussion_r53820120>:
>
> > - 1 leaf node; depth 1 means 1 internal node + 2 leaf
> > - nodes. (default: 3)
> > - :param maxBins: maximum number of bins used for splitting
> > - features (default: 32) DecisionTree requires maxBins
>= max categories
> > - :return: GradientBoostedTreesModel that can be used for
> > - prediction
> > + :param data:
> > + Training dataset: RDD of LabeledPoint. Labels should take
values
> > + {0, 1}.
> > + :param categoricalFeaturesInfo:
> > + Map storing arity of categorical features. E.g., an entry (n
->
> > + k) indicates that feature n is categorical with k categories
> > + indexed from 0: {0, 1, ..., k-1}.
> > + :param loss:
> > + Loss function used for minimization during gradient boosting.
> > + Supported values: {"logLoss", "leastSquaresError",
>
> I agree that we should have a consistent format for Supported Values or
Allowed
> Values from classification.py and regression.py. The difference with the
> others is that they have a small description for each value too.
>
> @vijaykiran <https://github.com/vijaykiran> , it would be great to finish
> this up and get it merged. If you are unable to, I could take it from
here,
> thanks!
>
> â
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/spark/pull/10601/files#r53820120>.
>
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]