[GitHub] spark pull request: [SPARK-1212, Part II] Support sparse data in M...

mateiz Wed, 02 Apr 2014 13:25:24 -0700

Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/245#discussion_r11225671
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala ---
    @@ -41,6 +39,107 @@ object MLUtils {
       }
     
       /**
    +   * Multiclass label parser, which parses a string into double.
    +   */
    +  val multiclassLabelParser: String => Double = _.toDouble
    +
    +  /**
    +   * Binary label parser, which outputs 1.0 (positive) if the value is 
greater than 0.5,
    +   * or 0.0 (negative) otherwise.
    +   */
    +  val binaryLabelParser: String => Double = label => if (label.toDouble > 
0.5) 1.0 else 0.0
    +
    +  /**
    +   * Loads labeled data in the LIBSVM format into an RDD[LabeledPoint].
    +   * The LIBSVM format is a text-based format used by LIBSVM and LIBLINEAR.
    +   * Each line represents a labeled sparse feature vector using the 
following format:
    +   * {{{label index1:value1 index2:value2 ...}}}
    +   * where the indices are one-based and in ascending order.
    +   * This method parses each line into a 
[[org.apache.spark.mllib.regression.LabeledPoint]],
    +   * where the feature indices are converted to zero-based.
    +   *
    +   * @param sc Spark context
    +   * @param path file or directory path in any Hadoop-supported file 
system URI
    +   * @param labelParser parser for labels, default: 1.0 if label > 0.5 or 
0.0 otherwise
    +   * @param numFeatures number of features, which will be determined from 
the input data if a
    +   *                    negative value is given. The default value is -1.
    +   * @param minSplits min number of partitions, default: 
sc.defaultMinSplits
    --- End diff --
    
    This method no longer has default values, right? (Or you'll add some other 
variants later.)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1212, Part II] Support sparse data in M...

Reply via email to