Hi Yan, I think you'll have to map the features column to a new numerical features column.
Here's one way to do the individual transform: scala> val x = "[1, 2, 3, 4, 5]" x: String = [1, 2, 3, 4, 5] scala> val y:Array[Int] = x slice(1, x.length - 1) replace(",", "") split(" ") map(_.toInt) y: Array[Int] = Array(1, 2, 3, 4, 5) If you don't know about the Scala command line, just type "scala" in a terminal window. It's a good place to try things out. You can make a function out of this transformation and apply it to your features column to make a new column. Then add this with Dataset.withColumn. See here <http://stackoverflow.com/questions/35227568/applying-function-to-spark-dataframe-column> on how to apply a function to a Column to make a new column. On Tue, Sep 6, 2016 at 1:56 AM, 颜发才(Yan Facai) <yaf...@gmail.com> wrote: > Hi, > I have a csv file like: > uid mid features label > 123 5231 [0, 1, 3, ...] True > > Both "features" and "label" columns are used for GBTClassifier. > > However, when I read the file: > Dataset<Row> samples = sparkSession.read().csv(file); > The type of samples.select("features") is String. > > My question is: > How to map samples.select("features") to Vector or any appropriate type, > so I can use it to train like: > GBTClassifier gbdt = new GBTClassifier() > .setLabelCol("label") > .setFeaturesCol("features") > .setMaxIter(2) > .setMaxDepth(7); > > Thanks. >