Github user feynmanliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8518#discussion_r38265215
  
    --- Diff: docs/ml-guide.md ---
    @@ -422,30 +402,19 @@ This example follows the simple text document 
`Pipeline` illustrated in the figu
     
     <div data-lang="scala">
     {% highlight scala %}
    -import org.apache.spark.{SparkConf, SparkContext}
     import org.apache.spark.ml.Pipeline
     import org.apache.spark.ml.classification.LogisticRegression
     import org.apache.spark.ml.feature.{HashingTF, Tokenizer}
     import org.apache.spark.mllib.linalg.Vector
    -import org.apache.spark.sql.{Row, SQLContext}
    -
    -// Labeled and unlabeled instance types.
    -// Spark SQL can infer schema from case classes.
    -case class LabeledDocument(id: Long, text: String, label: Double)
    -case class Document(id: Long, text: String)
    -
    -// Set up contexts.  Import implicit conversions to DataFrame from 
sqlContext.
    -val conf = new SparkConf().setAppName("SimpleTextClassificationPipeline")
    -val sc = new SparkContext(conf)
    -val sqlContext = new SQLContext(sc)
    -import sqlContext.implicits._
    +import org.apache.spark.sql.Row
     
     // Prepare training documents, which are labeled.
    --- End diff --
    
    Might be useful to specify that shcema is (id, document, label) since I 
would expect a categorical label to be a discrete type (the 0th column) rather 
than continuous (the 2nd column)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to