Github user etrain commented on a diff in the pull request:
https://github.com/apache/spark/pull/886#discussion_r13982852
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
---
@@ -49,6 +49,7 @@ object DecisionTreeRunner {
case class Params(
input: String = null,
algo: Algo = Classification,
+ numClassesForClassification: Int = 2,
--- End diff --
Yeah, makes sense. If it doesn't complicate things too much we might
consider adding an interface that doesn't have this specified and figures
it out in one shot.
Worth noting is that in R, an object of type "factor" (the default for
categorical/label data) has this information built in. It can be a big pain
at load time while the system tries to figure out the cardinality of the
factor, but it leads to a nice compact representation of the data and
eliminates situations like this one.
I agree on doing the API separation with the ensembles PR.
On Thu, Jun 19, 2014 at 10:46 AM, manishamde <[email protected]>
wrote:
> In
>
examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala:
>
> > @@ -49,6 +49,7 @@ object DecisionTreeRunner {
> > case class Params(
> > input: String = null,
> > algo: Algo = Classification,
> > + numClassesForClassification: Int = 2,
>
> Inference from a large dataset could take a lot of time. In general, most
> practitioners know in advance. If not, we can add a pre-processing step.
>
> Currently we have only numClassesForClassification as a classification
> specific parameter. In general, I agree with you. At the same time, didn't
> want to create more configuration classes for the user. Shall we leave it
> as is for now and handle it with the ensembles PR where we have more
> parameters (boosting iterations, num trees, feature subsetting, etc.) ?
>
> â
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/spark/pull/886/files#r13982468>.
>
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---