[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1473: Feature selection fo...

avulanov Tue, 05 Aug 2014 01:59:26 -0700

Github user avulanov commented on the pull request:

    https://github.com/apache/spark/pull/1484#issuecomment-51168631
  
    @mengxr 
    1)  I also have concerns regarding the mentioned two options. Throwing an 
error means to have a method that returns an error when it is called with valid 
parameters. Calling `fit` inside `transform` will cause a question what the 
next `fit` call will do. 
    2)  Could you explain how the upper bound like `[T <: Vectorized with 
Labeled]` can be implemented? `LabeledPoint` is a case class with no class 
hierarchy or traits. 
    3)  It seems that all implementations of transform will do the same: filter 
features by index. I propose to implement such a filter. It also will solve the 
problem of filtering both `LabeledPoint` and `Vector`:
    ```
    trait FeatureFilter {
       val indices: Set[Int]
       def transform(RDD[LabeledPoint]: data) = data.map { lp => new 
LabeledPoint(lp.label, Compress(lp.features, indices)) }
       def transform(RDD[Vector]: data) = data.map { v => Compress(v, indices) }
    }
    
    object Compress {
     def apply(features: Vector, indexes: Set[Int]): Vector = {
        val (values, _) =
          features.toArray.zipWithIndex.filter { case (value, index) =>
            indexes.contains(index)}.unzip
        Vectors.dense(values.toArray)
      }
    }
    
    class ChiSquaredFeatureSelection(RDD[LabeledPoint]: data, Int: numFeatures) 
extends FeatureFilter {
       // compute chiSquared and return feature indices 
       featureIndices = {....}
    }
    
    
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1473: Feature selection fo...

Reply via email to