how about change Schema from
root
 |-- category.firstCategory: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- category: string (nullable = true)
 |    |    |-- weight: string (nullable = true)

to:

root
 |-- category: string (nullable = true)
 |-- weight: string (nullable = true)

2016-10-21 

lk_spark 



发件人:颜发才(Yan Facai) <yaf...@gmail.com>
发送时间:2016-10-21 15:35
主题:Re: How to iterate the element of an array in DataFrame?
收件人:"user.spark"<user@spark.apache.org>
抄送:

I don't know how to construct `array<struct<category:string,weight:string>>`.
Could anyone help me?


I try to get the array by :
scala> mblog_tags.map(_.getSeq[(String, String)](0))

while the result is:
res40: org.apache.spark.sql.Dataset[Seq[(String, String)]] = [value: 
array<struct<_1:string,_2:string>>]




How to express `struct<string, string>` ?






On Thu, Oct 20, 2016 at 4:34 PM, 颜发才(Yan Facai) <yaf...@gmail.com> wrote:

Hi, I want to extract the attribute `weight` of an array, and combine them to 
construct a sparse vector. 



### My data is like this:

scala> mblog_tags.printSchema
root
 |-- category.firstCategory: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- category: string (nullable = true)
 |    |    |-- weight: string (nullable = true)


scala> mblog_tags.show(false)
+--------------------------------------------------------------+
|category.firstCategory                                        |
+--------------------------------------------------------------+
|[[tagCategory_060, 0.8], [tagCategory_029, 0.7]]|
|[[tagCategory_029, 0.9]]                                      |
|[[tagCategory_029, 0.8]]                                      |
+--------------------------------------------------------------+



### And expected:
Vectors.sparse(100, Array(60, 29),  Array(0.8, 0.7))
Vectors.sparse(100, Array(29),  Array(0.9))
Vectors.sparse(100, Array(29),  Array(0.8))


How to iterate an array in DataFrame?

Thanks.

Reply via email to