I don't know how to construct `array<struct<category:string,weight:string>>`. Could anyone help me?
I try to get the array by : scala> mblog_tags.map(_.getSeq[(String, String)](0)) while the result is: res40: org.apache.spark.sql.Dataset[Seq[(String, String)]] = [value: array<struct<_1:string,_2:string>>] How to express `struct<string, string>` ? On Thu, Oct 20, 2016 at 4:34 PM, 颜发才(Yan Facai) <yaf...@gmail.com> wrote: > Hi, I want to extract the attribute `weight` of an array, and combine them > to construct a sparse vector. > > ### My data is like this: > > scala> mblog_tags.printSchema > root > |-- category.firstCategory: array (nullable = true) > | |-- element: struct (containsNull = true) > | | |-- category: string (nullable = true) > | | |-- weight: string (nullable = true) > > > scala> mblog_tags.show(false) > +--------------------------------------------------------------+ > |category.firstCategory | > +--------------------------------------------------------------+ > |[[tagCategory_060, 0.8], [tagCategory_029, 0.7]]| > |[[tagCategory_029, 0.9]] | > |[[tagCategory_029, 0.8]] | > +--------------------------------------------------------------+ > > > ### And expected: > Vectors.sparse(100, Array(60, 29), Array(0.8, 0.7)) > Vectors.sparse(100, Array(29), Array(0.9)) > Vectors.sparse(100, Array(29), Array(0.8)) > > How to iterate an array in DataFrame? > Thanks. > > > >