My expectation is:
root
|-- tag: vector

namely, I want to extract from:
[[tagCategory_060, 0.8], [tagCategory_029, 0.7]]|
to:
Vectors.sparse(100, Array(60, 29),  Array(0.8, 0.7))

I believe it needs two step:
1. val tag2vec = {tag: Array[Structure] => Vector}
2. mblog_tags.withColumn("vec", tag2vec(col("tag"))

But, I have no idea of how to describe the Array[Structure] in the
DataFrame.





On Fri, Oct 21, 2016 at 4:51 PM, lk_spark <lk_sp...@163.com> wrote:

> how about change Schema from
> root
>  |-- category.firstCategory: array (nullable = true)
>  |    |-- element: struct (containsNull = true)
>  |    |    |-- category: string (nullable = true)
>  |    |    |-- weight: string (nullable = true)
> to:
>
> root
>  |-- category: string (nullable = true)
>  |-- weight: string (nullable = true)
>
> 2016-10-21
> ------------------------------
> lk_spark
> ------------------------------
>
> *发件人:*颜发才(Yan Facai) <yaf...@gmail.com>
> *发送时间:*2016-10-21 15:35
> *主题:*Re: How to iterate the element of an array in DataFrame?
> *收件人:*"user.spark"<user@spark.apache.org>
> *抄送:*
>
> I don't know how to construct `array<struct<category:string,
> weight:string>>`.
> Could anyone help me?
>
> I try to get the array by :
> scala> mblog_tags.map(_.getSeq[(String, String)](0))
>
> while the result is:
> res40: org.apache.spark.sql.Dataset[Seq[(String, String)]] = [value:
> array<struct<_1:string,_2:string>>]
>
>
> How to express `struct<string, string>` ?
>
>
>
> On Thu, Oct 20, 2016 at 4:34 PM, 颜发才(Yan Facai) <yaf...@gmail.com> wrote:
>
>> Hi, I want to extract the attribute `weight` of an array, and combine
>> them to construct a sparse vector.
>>
>> ### My data is like this:
>>
>> scala> mblog_tags.printSchema
>> root
>>  |-- category.firstCategory: array (nullable = true)
>>  |    |-- element: struct (containsNull = true)
>>  |    |    |-- category: string (nullable = true)
>>  |    |    |-- weight: string (nullable = true)
>>
>>
>> scala> mblog_tags.show(false)
>> +--------------------------------------------------------------+
>> |category.firstCategory                                        |
>> +--------------------------------------------------------------+
>> |[[tagCategory_060, 0.8], [tagCategory_029, 0.7]]|
>> |[[tagCategory_029, 0.9]]                                      |
>> |[[tagCategory_029, 0.8]]                                      |
>> +--------------------------------------------------------------+
>>
>>
>> ### And expected:
>> Vectors.sparse(100, Array(60, 29),  Array(0.8, 0.7))
>> Vectors.sparse(100, Array(29),  Array(0.9))
>> Vectors.sparse(100, Array(29),  Array(0.8))
>>
>> How to iterate an array in DataFrame?
>> Thanks.
>>
>>
>>
>>
>

Reply via email to