I find that it's really confuse.
I can use Vectors.parse to create a DataFrame contains Vector type.
scala> val dataVec = Seq((0, Vectors.parse("[1,3,5]")), (1,
dataVec: org.apache.spark.sql.DataFrame = [_1: int, _2: vector]
But using map to convert String to Vector throws an error:
scala> val dataStr = Seq((0, "[1,3,5]"), (1, "[2,4,6]")).toDF
dataStr: org.apache.spark.sql.DataFrame = [_1: int, _2: string]
scala> dataStr.map(row => Vectors.parse(row.getString(1)))
<console>:30: error: Unable to find encoder for type stored in a
Dataset. Primitive types (Int, String, etc) and Product types (case
classes) are supported by importing spark.implicits._ Support for
serializing other types will be added in future releases.
dataStr.map(row => Vectors.parse(row.getString(1)))
Dose anyone can help me,
thanks very much!
On Tue, Sep 6, 2016 at 9:58 PM, Peter Figliozzi <pete.figlio...@gmail.com>
> Hi Yan, I think you'll have to map the features column to a new numerical
> features column.
> Here's one way to do the individual transform:
> scala> val x = "[1, 2, 3, 4, 5]"
> x: String = [1, 2, 3, 4, 5]
> scala> val y:Array[Int] = x slice(1, x.length - 1) replace(",", "")
> split(" ") map(_.toInt)
> y: Array[Int] = Array(1, 2, 3, 4, 5)
> If you don't know about the Scala command line, just type "scala" in a
> terminal window. It's a good place to try things out.
> You can make a function out of this transformation and apply it to your
> features column to make a new column. Then add this with
> See here
> on how to apply a function to a Column to make a new column.
> On Tue, Sep 6, 2016 at 1:56 AM, 颜发才(Yan Facai) <yaf...@gmail.com> wrote:
>> I have a csv file like:
>> uid mid features label
>> 123 5231 [0, 1, 3, ...] True
>> Both "features" and "label" columns are used for GBTClassifier.
>> However, when I read the file:
>> Dataset<Row> samples = sparkSession.read().csv(file);
>> The type of samples.select("features") is String.
>> My question is:
>> How to map samples.select("features") to Vector or any appropriate type,
>> so I can use it to train like:
>> GBTClassifier gbdt = new GBTClassifier()