When you pass a data frame into the train method of LogisticRegression and other ML learning algorithms, the data is extracted by using parameters `labelCol` and `featuresCol` which should have been set before calling the train method (they default to "label" and "features", respectively). `featuresCol` should be a Vector type consisting of Doubles. When the train method is called, it tries to verify that the data type of `featuresCol` is type Vector and that the data type of `labelCol` is of type Double. It will throw an exception if other data types are found.
Spark ML has special ways of handling features that are not inherently continuous or numerical. I urge you to review this question on StackOverflow which covers it quite well: http://stackoverflow.com/questions/32277576/spark-ml-categorical-features -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-Spark-ml-LogisticRegression-assumes-only-Double-valued-features-tp24575p24630.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org