how about change Schema from root |-- category.firstCategory: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- category: string (nullable = true) | | |-- weight: string (nullable = true)
to: root |-- category: string (nullable = true) |-- weight: string (nullable = true) 2016-10-21 lk_spark 发件人:颜发才(Yan Facai) <yaf...@gmail.com> 发送时间:2016-10-21 15:35 主题:Re: How to iterate the element of an array in DataFrame? 收件人:"user.spark"<user@spark.apache.org> 抄送: I don't know how to construct `array<struct<category:string,weight:string>>`. Could anyone help me? I try to get the array by : scala> mblog_tags.map(_.getSeq[(String, String)](0)) while the result is: res40: org.apache.spark.sql.Dataset[Seq[(String, String)]] = [value: array<struct<_1:string,_2:string>>] How to express `struct<string, string>` ? On Thu, Oct 20, 2016 at 4:34 PM, 颜发才(Yan Facai) <yaf...@gmail.com> wrote: Hi, I want to extract the attribute `weight` of an array, and combine them to construct a sparse vector. ### My data is like this: scala> mblog_tags.printSchema root |-- category.firstCategory: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- category: string (nullable = true) | | |-- weight: string (nullable = true) scala> mblog_tags.show(false) +--------------------------------------------------------------+ |category.firstCategory | +--------------------------------------------------------------+ |[[tagCategory_060, 0.8], [tagCategory_029, 0.7]]| |[[tagCategory_029, 0.9]] | |[[tagCategory_029, 0.8]] | +--------------------------------------------------------------+ ### And expected: Vectors.sparse(100, Array(60, 29), Array(0.8, 0.7)) Vectors.sparse(100, Array(29), Array(0.9)) Vectors.sparse(100, Array(29), Array(0.8)) How to iterate an array in DataFrame? Thanks.