[GitHub] spark pull request: SPARK-1536: multiclass classification support ...

manishamde Tue, 27 May 2014 20:17:35 -0700

Github user manishamde commented on the pull request:

    https://github.com/apache/spark/pull/886#issuecomment-44361546
  
    @etrain Given our default 'maxBins' setting of 100, one could at most use a 
maximum cardinality of 7 for a categorical features. Also, the number of splits 
in the 'check for all' strategy will always be lower than the bin size.
    
    We will be forced to use a heuristic if we support more than 10 categorical 
values. I am fine with using entropy (as @srowen suggested) -- a reference will 
be great if we can find one. 
    
    It's always difficult to find a reference for practical issues since they 
are harder to publish. :-(



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1536: multiclass classification support ...

Reply via email to